Non‐parametric statistical methods for multivariate calibration model selection and comparison | Zendy

Thomas Edward V. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Non‐parametric statistical methods for multivariate calibration model selection and comparison

Author(s) -

Thomas Edward V.

Publication year - 2003

Publication title -

journal of chemometrics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.47

H-Index - 92

eISSN - 1099-128X

pISSN - 0886-9383

DOI - 10.1002/cem.833

Subject(s) - latent variable , latent variable model , multivariate statistics , partial least squares regression , calibration , statistics , parametric statistics , principal component regression , latent class model , mathematics , computer science , principal component analysis , regression analysis , model selection

Model selection is an important issue when constructing multivariate calibration models using methods based on latent variables (e.g. partial least squares regression and principal component regression). It is important to select an appropriate number of latent variables to build an accurate and precise calibration model. Inclusion of too few latent variables can result in a model that is inaccurate over the complete space of interest. Inclusion of too many latent variables can result in a model that produces noisy predictions through incorporation of low‐order latent variables that have little or no predictive value. Commonly used metrics for selecting the number of latent variables are based on the predicted error sum of squares (PRESS) obtained via cross‐validation. In this paper a new approach for selecting the number of latent variables is proposed. In this new approach the prediction errors of individual observations (obtained from cross‐validation) are compared across models incorporating varying numbers of latent variables. Based on these comparisons, non‐parametric statistical methods are used to select the simplest model (least number of latent variables) that provides prediction quality that is indistinguishable from that provided by more complex models. Unlike methods based on PRESS, this new approach is robust to the effects of anomalous observations. More generally, the same approach can be used to compare the performance of any models that are applied to the same data set where reference values are available. The proposed methodology is illustrated with an industrial example involving the prediction of gasoline octane numbers from near‐infrared spectra. Published in 2004 by John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research