z-logo
Premium
Clustering of variables in regression analysis: a comparative study between different algorithms
Author(s) -
Hemmateenejad Bahram,
Karimi Sadegh,
Mobaraki Nabiollah
Publication year - 2013
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.2513
Subject(s) - cluster analysis , principal component analysis , pattern recognition (psychology) , artificial intelligence , hierarchical clustering , computer science , partial least squares regression , data mining , mathematics , fuzzy clustering , multivariate statistics , projection (relational algebra) , projection pursuit , regression analysis , self organizing map , mean squared error , regression , single linkage clustering , statistics , cure data clustering algorithm , algorithm
We have recently suggested using clustering of variables (CLoVA) based on unsupervised pattern recognition for partitioning variables into informative and redundant ones. Because data clustering plays a central role in CLoVA, in the present study, we compared the efficiency of different clustering methods including the Kohonen self‐organizing map (SOM), principal component analysis, fuzzy c‐means clustering, K ‐means clustering, and hierarchical cluster analysis for clustering spectroscopic data and molecular descriptors to build multivariate calibration and quantitative structure–activity (QSAR) models. To investigate which clustering methods are more efficient for CLoVA, four data sets (three spectroscopic and one QSAR) were analyzed. Most of the CLoVA‐based models obtained by SOM resulted in the least root mean square errors of cross‐validation and prediction, suggesting a higher efficiency of SOM for clustering variables. In all cases, the results obtained by the CLoVA‐based method were compared with those obtained by conventional principal component regression as well as genetic algorithm and successive projection algorithm partial least square regression. Interestingly, models produced by the CLoVA‐based method were more predictive with respect to that of the other methods, as indicated by the lowest root mean square error of prediction. Copyright © 2013 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here