Premium
Concept and role of extreme objects in PCA/SIMCA
Author(s) -
Pomerantsev Alexey L.,
Rodionova Oxana Ye
Publication year - 2014
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.2506
Subject(s) - outlier , principal component analysis , robust principal component analysis , pattern recognition (psychology) , artificial intelligence , data set , set (abstract data type) , estimator , computer science , calibration , multidimensional scaling , extreme learning machine , mathematics , data mining , statistics , artificial neural network , programming language
For the construction of a reliable decision area in the soft independent modeling by class analogy (SIMCA) method, it is necessary to analyze calibration data revealing the objects of special types such as extremes and outliers. For this purpose, a thorough statistical analysis of the scores and orthogonal distances is necessary. The distance values should be considered as any data acquired in the experiment, and their distributions are estimated by a data‐driven method, such as a method of moments or similar. The scaled chi‐squared distribution seems to be the first candidate among the others in such an assessment. This provides the possibility of constructing a two‐level decision area, with the extreme and outlier thresholds, both in case of regular data set and in the presence of outliers. We suggest the application of classical principal component analysis (PCA) with further use of enhanced robust estimators both for the scaling factor and for the number of degrees of freedom. A special diagnostic tool called extreme plot is proposed for the analyses of calibration objects. Extreme objects play an important role in data analysis. These objects are a mandatory attribute of any data set. The advocated dual data‐driven PCA/SIMCA (DD‐SIMCA) approach has demonstrated a proper performance in the analysis of simulated and real‐world data for both regular and contaminated cases. DD‐SIMCA has also been compared with robust principal component analysis, which is a fully robust method. Copyright © 2013 John Wiley & Sons, Ltd.