Premium
Treatment of missing values in process data analysis
Author(s) -
Imtiaz S. A.,
Shah S. L.
Publication year - 2008
Publication title -
the canadian journal of chemical engineering
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.404
H-Index - 67
eISSN - 1939-019X
pISSN - 0008-4034
DOI - 10.1002/cjce.20099
Subject(s) - missing data , outlier , univariate , data mining , computer science , multivariate statistics , process (computing) , data analysis , principal component analysis , statistics , artificial intelligence , mathematics , machine learning , operating system
Process data suffer from many different types of imperfections. For example, bad data due to sensor problems, multi‐rate sampling, outliers, compressed data etc. Since most modelling and data analysis methods are developed to analyze regularly sampled and well conditioned data sets there is a need for pre‐treatment of data. Traditionally data conditioning or pre‐treatment has been done without taking into account the end use of the data, for example, univariate methods have been used to interpolate bad data even when the intended end use of data is for multivariate analysis. In this paper we consider the pre‐treatment and data analysis as a collective problem and propose data conditioning methods in a multivariate framework. We first review classical process data analysis methods and acclaimed missing data handling techniques used in statistical surveys and biostatistics. The applications of these acclaimed missing data techniques are demonstrated in three different instances: (i) principal components analysis (PCA) is extended in data augmentation (DA) framework for dealing with missing values, (ii) iterative missing data technique is used to synchronize uneven length batch process data, and (iii) PCA based iterative missing data technique is used to restore the correlation structure of compressed data.