Premium
Using elastic net regression to perform spectrally relevant variable selection
Author(s) -
Giglio Can,
Brown Steven D.
Publication year - 2018
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.3034
Subject(s) - elastic net regularization , partial least squares regression , feature selection , regression analysis , interpretability , statistics , regression , variable elimination , linear regression , variables , mathematics , multivariate statistics , selection (genetic algorithm) , segmented regression , computer science , artificial intelligence , bayesian multivariate linear regression , inference
Abstract Multivariate data such as spectra frequently contain measured variables that are uninformative, and removal of such variables requires the use of methods that can be used to select informative variables. Partial least squares (PLS) regression may incorporate information from uninformative measured variables, and so it is important to select variables before performing the PLS regression. Elastic net (EN) regression can be used to perform variable selection automatically. An EN regression can be used to select groups of correlated variables or to select either sparse or nonsparse sets of variables. However, the predictive performance of the EN regression can be significantly worse than competing 1‐step variable selection methods such as variable importance in projection (VIP). In the present work, the use of the EN to select variables, followed by conventional PLS regression on the selected variables (EN‐PLS), has been investigated. Variable selection by using EN‐PLS was compared with that from EN regression, sparse PLS regression, VIP, and from selectivity ratio selection on 2 data sets of visible/near‐infrared spectra. In all cases, the wavelengths selected were compared with reference data. The variables selected by using EN‐PLS offered advantages in interpretability and gave more robust prediction performance as compared with those obtained from full‐spectrum PLS and the other variable selection methods. This paper reports a method for variable selection by using an EN regression prior to a second regression by using PLS, a 2‐step method termed EN‐PLS. Variables selected by using EN‐PLS are compared with variables selected from the EN regression, as well as VIP, selectivity ratio, and the sparse PLS regression, 3 commonly used methods for variable selection in chemometrics. The EN‐PLS is shown to select variables that were more easily interpreted. In addition, EN‐PLS performed more robustly than a PLS regression performed on all variables, as well as reduced PLS regressions by using variables selected from either the sparse PLS regression algorithm or a VIP variable selection followed by PLS modeling.