z-logo
Premium
Fast and stable partial least squares modelling: A benchmark study with theoretical comments
Author(s) -
Björck Åke,
Indahl Ulf G.
Publication year - 2017
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.2898
Subject(s) - benchmark (surveying) , partial least squares regression , algorithm , computer science , matlab , kernel (algebra) , chemometrics , context (archaeology) , mathematics , machine learning , geodesy , combinatorics , biology , geography , operating system , paleontology
Algorithms for partial least squares (PLS) modelling are placed into a sound theoretical context focusing on numerical precision and computational efficiency. NIPALS and other PLS algorithms that perform deflation steps of the predictors ( X ) may be slow or even computationally infeasible for sparse and/or large‐scale data sets. As alternatives, we develop new versions of the Bidiag1 and Bidiag2 algorithms. These include full reorthogonalization of both score and loading vectors, which we consider to be both necessary and sufficient for numerical precision. Using a collection of benchmark data sets, these 2 new algorithms are compared to the NIPALS PLS and 4 other PLS algorithms acknowledged in the chemometrics literature. The provably stable Householder algorithm for PLS regression is taken as the reference method for numerical precision. Our conclusion is that our new Bidiag1 and Bidiag2 algorithms are the methods of choice for problems where both efficiency and numerical precision are important. When efficiency is not urgent, the NIPALS PLS and the Householder PLS are also good choices. The benchmark study shows that SIMPLS gives poor numerical precision even for a small number of factors. Further, the nonorthogonal scores PLS, direct scores PLS, and the improved kernel PLS are demonstrated to be numerically less stable than the best algorithms. Prototype MATLAB codes are included for the 5 PLS algorithms concluded to be numerically stable on our benchmark data sets. Other aspects of PLS modelling, such as the evaluation of the regression coefficients, are also analyzed using techniques from numerical linear algebra.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here