Premium
Evolutionary variable selection in regression and PLS analyses
Author(s) -
Kubinyi Hugo
Publication year - 1996
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/(sici)1099-128x(199603)10:2<119::aid-cem409>3.0.co;2-4
Subject(s) - regression analysis , regression , variable (mathematics) , set (abstract data type) , selection (genetic algorithm) , local optimum , feature selection , computer science , mathematics , artificial intelligence , statistics , machine learning , mathematical analysis , programming language
Evolutionary and genetic algorithms are powerful tools for searching global optima of complex functions. An evolutionary approach, the MUSEUM (mutation and selection uncover models) programme, is applied to various QSAR data sets to prove the general applicability of this approach for variable selection in regression and PLS analyses. ‘Best’ regression models are found within seconds or a few minutes of calculation time, even for data sets including large numbers of variables. The MUSEUM algorithm starts from an arbitrary model and adds or eliminates variables to or from this model in a random manner. Any ‘better’ model defined by a certain fitness criterion is taken as a new breeding organism which is mutated by further variable additions, eliminations or exchanges. In this manner the models improve gradually until a global optimum or at least a good local optimum results. In most cases several different models are obtained from different runs. A systematic search for the best models indicates that in all cases the global optima and good local optima result from the evolutionary search. Most often the fit and cross‐validation results of these regression models are better than the fit and cross‐validation results of a PLS analysis which includes all variables of the data set. The variables contained in the best regression models are suitable as subsets for PLS analyses and some of these PLS results are even better than the best regression results.