Premium
Framework for regression‐based missing data imputation methods in on‐line MSPC
Author(s) -
Arteaga Francisco,
Ferrer Alberto
Publication year - 2005
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.946
Subject(s) - missing data , imputation (statistics) , computer science , data mining , multivariate statistics , regression , principal component analysis , regression analysis , principal component regression , statistical model , statistics , artificial intelligence , machine learning , mathematics
Missing data are a critical issue in on‐line multivariate statistical process control (MSPC). Among the different scores estimation methods for future multivariate incomplete observations from an existing principal component analysis (PCA) model, the most statistical efficient ones are those that estimate the scores for the new incomplete observation as the prediction from a regression model. We have called them regression‐based methods. Several approximations have been proposed in the literature to overcome the singularity or ill‐conditioning problems that some of the mentioned methods can suffer due to missing data. This is particularly acute in on‐line batch process monitoring. In order to ease the comparison of the statistical performance of these methods and to improve the understanding of their relationships, in this paper we propose a framework that allows to write these regression‐based methods by an unique expression, function of a key matrix. From this framework a statistical performance index (PRESV) is introduced as a way to compare the statistical efficiency of the different framework members and to predict the impact of specific missing data combinations on scores estimation without requiring real data. The results are illustrated by application to several continuous and batch industrial data sets. Copyright © 2005 John Wiley & Sons, Ltd.