Premium
A robust PCR method for high‐dimensional regressors
Author(s) -
Hubert Mia,
Verboven Sabine
Publication year - 2003
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.783
Subject(s) - principal component analysis , outlier , robust regression , robustness (evolution) , principal component regression , multivariate statistics , regression , data set , regression analysis , partial least squares regression , robust principal component analysis , mathematics , statistics , computer science , linear regression , robust statistics , set (abstract data type) , pattern recognition (psychology) , artificial intelligence , biology , biochemistry , gene , programming language
We consider the multivariate calibration model which assumes that the concentrations of several constituents of a sample are linearly related to its spectrum. Principal component regression (PCR) is widely used for the estimation of the regression parameters in this model. In the classical approach it combines principal component analysis (PCA) on the regressors with least squares regression. However, both stages yield very unreliable results when the data set contains outlying observations. We present a robust PCR (RPCR) method which also consists of two parts. First we apply a robust PCA method for high‐dimensional data on the regressors, then we regress the response variables on the scores using a robust regression method. A robust RMSECV value and a robust R 2 value are proposed as exploratory tools to select the number of principal components. The prediction error is also estimated in a robust way. Moreover, we introduce several diagnostic plots which are helpful to visualize and classify the outliers. The robustness of RPCR is demonstrated through simulations and the analysis of a real data set. Copyright © 2003 John Wiley & Sons, Ltd.