The prediction error in CLS and PLS: the importance of feature selection prior to multivariate calibration | Zendy

Nadler Boaz | Zendy; Coifman Ronald R. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

The prediction error in CLS and PLS: the importance of feature selection prior to multivariate calibration

Author(s) -

Nadler Boaz,

Coifman Ronald R.

Publication year - 2005

Publication title -

journal of chemometrics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.47

H-Index - 92

eISSN - 1099-128X

pISSN - 0886-9383

DOI - 10.1002/cem.915

Subject(s) - partial least squares regression , mean squared error , chemometrics , feature selection , calibration , multivariate statistics , mathematics , statistics , pattern recognition (psychology) , algorithm , computer science , artificial intelligence , machine learning

Classical least squares (CLS) and partial least squares (PLS) are two common multivariate regression algorithms in chemometrics. This paper presents an asymptotically exact mathematical analysis of the mean squared error of prediction of CLS and PLS under the linear mixture model commonly assumed in spectroscopy. For CLS regression with a very large calibration set the root mean squared error is approximately equal to the noise per wavelength divided by the length of the net analyte signal vector. It is shown, however, that for a finite training set with n samples in p dimensions there are additional error terms that depend on σ 2 p 2 / n 2 , where σ is the noise level per co‐ordinate. Therefore in the ‘large p —small n ’ regime, common in spectroscopy, these terms can be quite large and even dominate the overall prediction error. It is demonstrated both theoretically and by simulations that dimensional reduction of the input data via their compact representation with a few features, selected for example by adaptive wavelet compression, can substantially decrease these effects and recover the asymptotic error. This analysis provides a theoretical justification for the need to perform feature selection (dimensional reduction) of the input data prior to application of multivariate regression algorithms. Copyright © 2005 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research