z-logo
Premium
Dynamic determination of the dimension of PCA calibration models using F‐statistics
Author(s) -
Vogt F.,
Mizaikoff B.
Publication year - 2003
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.813
Subject(s) - principal component analysis , calibration , dimension (graph theory) , residual , principal component regression , statistics , mathematics , computer science , algorithm , data mining , pure mathematics
Owing to experimental measurement errors, determination of the proper dimension of calibration models is difficult. Cross‐validation is a common approach for this purpose; however, if data evaluation is based on PCA only without consideration of sample concentrations, this computationally expensive method cannot be applied. In this study a statistical method for determining the proper dimension of PCA calibration models is presented from the viewpoint of multivariate regression analysis considering only measured data. For this iterative algorithm, individual principal components are included stepwise in a reduced model, which is subsequently tested against the full model including all principal components. This algorithm can be individually applied for optimized data evaluation to every measured data vector such as an optical spectrum of chemical analyte. This comparison is performed by an F‐test comparing estimates of residual variance of a measurement spectrum determined from the reduced and the full model. This approach determines a lack of fit due to insufficient principal components. If no lack of fit is evident for a certain reduced model, it is considered that a sufficiently large model has been found and inclusion of additional principal components is stopped. Hence the resulting reduced calibration model includes only statistically significant principal components (PCs) and determines the minimum number of required PCs for a given measurement spectrum. The proposed algorithm is initially investigated using simulated data and subsequently applied to three different experimental sets of spectra. It is shown that for synthetic data at reasonable noise levels the correct number of principal components can be determined in most cases. The experimental examples demonstrate that the number of principal components determined by the proposed algorithm is slightly larger than a user would select manually by subjective visual inspection. As one result, the algorithm is able to detect small but significant spectroscopic features of experimental data which would otherwise be neglected. Copyright © 2003 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here