Model selection techniques for sparse weight‐based principal component analysis | Zendy

Schipper Niek C. | Zendy; Van Deun Katrijn | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Model selection techniques for sparse weight‐based principal component analysis

Author(s) -

Schipper Niek C.,

Van Deun Katrijn

Publication year - 2021

Publication title -

journal of chemometrics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.47

H-Index - 92

eISSN - 1099-128X

pISSN - 0886-9383

DOI - 10.1002/cem.3289

Subject(s) - principal component analysis , lasso (programming language) , convex hull , regularization (linguistics) , model selection , computer science , mathematics , selection (genetic algorithm) , bayesian information criterion , cross validation , data set , bayesian probability , component analysis , pattern recognition (psychology) , artificial intelligence , algorithm , regular polygon , geometry , world wide web

Many studies make use of multiple types of data that are collected for the same set of samples, resulting in so‐called multiblock data (e.g., multiomics studies). A popular analysis framework is sparse principal component analysis (PCA) of the concatenated data. The sparseness in the component weights of these models is usually induced by penalties. A crucial factor in the use of such penalized methods is a proper tuning of the regularization parameters used to give more or less weight to the penalties. In this paper, we examine several model selection procedures to tune these regularization parameters for sparse PCA. The model selection procedures include cross‐validation, Bayesian information criterion (BIC), index of sparseness, and the convex hull procedure. Furthermore, to account for the multiblock structure, we present a sparse PCA algorithm with a group least absolute shrinkage and selection operator (LASSO) penalty added to it, to either select or cancel out blocks of data in an automated way. Also, the tuning of the group LASSO parameter is studied for the proposed model selection procedures. We conclude that when the component weights are to be interpreted, cross‐validation with the one standard error rule is preferred; alternatively, if the interest lies in obtaining component scores using a very limited set of variables, the convex hull, BIC, and index of sparseness are all suitable.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research