z-logo
Premium
Variable selection in random calibration of near‐infrared instruments: ridge regression and partial least squares regression settings
Author(s) -
Gusnanto Arief,
Pawitan Yudi,
Huang Jian,
Lane Bill
Publication year - 2003
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.787
Subject(s) - partial least squares regression , calibration , statistics , regression analysis , regression , mathematics , selection (genetic algorithm) , feature selection , data set , elastic net regularization , mean squared error , ridge , set (abstract data type) , computer science , artificial intelligence , paleontology , biology , programming language
Abstract Standard methods for calibration of near‐infrared instruments, such as partial least‐squares (PLS) and ridge regression (RR), typically use the full set of wavelengths in the model. In this paper we investigate the effect of variable (wavelength) selection for these two methods on the model prediction. For RR the selection is optimized with respect to the ridge parameter, the number of variables and the configuration of the variables in the model. A fast iterative computational algorithm is developed for the purpose of this optimization. For PLS the selection is optimized with respect to the number of components, the number of variables and the configuration of the variables. We use three real data sets in this study: processed milk from the market, milk from a dairy farm and milk from the production line of a milk processing factory. The quantity of interest is the concentration of fat in the milk. The observations are randomly split into estimation and validation sets. Optimization is based on the mean square prediction error computed on the validation set. The results indicate that the wavelength selection will not always give better prediction than using all of the available wavelengths. Investigation of the information in the spectra is necessary to determine whether all of them are relevant to the objective of the model. Copyright © 2003 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here