z-logo
Premium
Cross‐validation in PCA models with the element‐wise k ‐fold ( ekf ) algorithm: theoretical aspects
Author(s) -
Camacho José,
Ferrer Alberto
Publication year - 2012
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.2440
Subject(s) - imputation (statistics) , cross validation , principal component analysis , extended kalman filter , toolbox , missing data , computer science , algorithm , data mining , mathematics , artificial intelligence , kalman filter , machine learning , programming language
Cross‐validation has become one of the principal methods to adjust the meta‐parameters in predictive models. Extensions of the cross‐validation idea have been proposed to select the number of components in principal components analysis (PCA). The element‐wise k ‐fold ( ekf ) cross‐validation is among the most used algorithms for principal components analysis cross‐validation. This is the method programmed in the PLS_Toolbox, and it has been stated to outperform other methods under most circumstances in a numerical experiment. The ekf algorithm is based on missing data imputation, and it can be programmed using any method for this purpose. In this paper, the ekf algorithm with the simplest missing data imputation method, trimmed score imputation, is analyzed. A theoretical study is driven to identify in which situations the application of ekf is adequate and, more importantly, in which situations it is not. The results presented show that the ekf method may be unable to assess the extent to which a model represents a test set and may lead to discard principal components with important information. On a second paper of this series, other imputation methods are studied within the ekf algorithm. Copyright © 2012 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here