Framework for regression‐based missing data imputation methods in on‐line MSPC | Zendy

Arteaga Francisco | Zendy; Ferrer Alberto | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Framework for regression‐based missing data imputation methods in on‐line MSPC

Author(s) -

Arteaga Francisco,

Ferrer Alberto

Publication year - 2005

Publication title -

journal of chemometrics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.47

H-Index - 92

eISSN - 1099-128X

pISSN - 0886-9383

DOI - 10.1002/cem.946

Subject(s) - missing data , imputation (statistics) , computer science , data mining , multivariate statistics , regression , principal component analysis , regression analysis , principal component regression , statistical model , statistics , artificial intelligence , machine learning , mathematics

Missing data are a critical issue in on‐line multivariate statistical process control (MSPC). Among the different scores estimation methods for future multivariate incomplete observations from an existing principal component analysis (PCA) model, the most statistical efficient ones are those that estimate the scores for the new incomplete observation as the prediction from a regression model. We have called them regression‐based methods. Several approximations have been proposed in the literature to overcome the singularity or ill‐conditioning problems that some of the mentioned methods can suffer due to missing data. This is particularly acute in on‐line batch process monitoring. In order to ease the comparison of the statistical performance of these methods and to improve the understanding of their relationships, in this paper we propose a framework that allows to write these regression‐based methods by an unique expression, function of a key matrix. From this framework a statistical performance index (PRESV) is introduced as a way to compare the statistical efficiency of the different framework members and to predict the impact of specific missing data combinations on scores estimation without requiring real data. The results are illustrated by application to several continuous and batch industrial data sets. Copyright © 2005 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research