An efficient nonlinear programming strategy for PCA models with incomplete data sets | Zendy

de la Fuente Rodrigo LópezNegrete | Zendy; GarcíaMuñoz Salvador | Zendy; Biegler Lorenz T. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

An efficient nonlinear programming strategy for PCA models with incomplete data sets

Author(s) -

de la Fuente Rodrigo LópezNegrete,

GarcíaMuñoz Salvador,

Biegler Lorenz T.

Publication year - 2010

Publication title -

journal of chemometrics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.47

H-Index - 92

eISSN - 1099-128X

pISSN - 0886-9383

DOI - 10.1002/cem.1306

Subject(s) - missing data , principal component analysis , nonlinear system , curse of dimensionality , computer science , partial least squares regression , minification , data mining , nonlinear programming , algorithm , artificial intelligence , mathematics , pattern recognition (psychology) , mathematical optimization , machine learning , physics , quantum mechanics

Processing plants can produce large amounts of data that process engineers use for analysis, monitoring, or control. Principal component analysis (PCA) is well suited to analyze large amounts of (possibly) correlated data, and for reducing the dimensionality of the variable space. Failing online sensors, lost historical data, or missing experiments can lead to data sets that have missing values where the current methods for obtaining the PCA model parameters may give questionable results due to the properties of the estimated parameters. This paper proposes a method based on nonlinear programming (NLP) techniques to obtain the parameters of PCA models in the presence of incomplete data sets. We show the relationship that exists between the nonlinear iterative partial least squares (NIPALS) algorithm and the optimality conditions of the squared residuals minimization problem, and how this leads to the modified NIPALS used for the missing value problem. Moreover, we compare the current NIPALS‐based methods with the proposed NLP with a simulation example and an industrial case study, and show how the latter is better suited when there are large amounts of missing values. The solutions obtained with the NLP and the iterative algorithm (IA) are very similar. However when using the NLP‐based method, the loadings and scores are guaranteed to be orthogonal, and the scores will have zero mean. The latter is emphasized in the industrial case study. Also, with the industrial data used here we are able to show that the models obtained with the NLP were easier to interpret. Moreover, when using the NLP many fewer iterations were required to obtain them. Copyright © 2010 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research