Premium
Order selection and sparsity in latent variable models via the ordered factor LASSO
Author(s) -
Hui Francis K. C.,
Tanaka Emi,
Warton David I.
Publication year - 2018
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/biom.12888
Subject(s) - lasso (programming language) , latent variable , selection (genetic algorithm) , variable (mathematics) , mathematics , feature selection , factor (programming language) , latent variable model , order (exchange) , computer science , statistics , econometrics , artificial intelligence , economics , mathematical analysis , world wide web , finance , programming language
Summary Generalized linear latent variable models (GLLVMs) offer a general framework for flexibly analyzing data involving multiple responses. When fitting such models, two of the major challenges are selecting the order, that is, the number of factors, and an appropriate structure for the loading matrix, typically a sparse structure. Motivated by the application of GLLVMs to study marine species assemblages in the Southern Ocean, we propose the Ordered Factor LASSO or OFAL penalty for order selection and achieving sparsity in GLLVMs. The OFAL penalty is the first penalty developed specifically for order selection in latent variable models, and achieves this by using a hierarchically structured group LASSO type penalty to shrink entire columns of the loading matrix to zero, while ensuring that non‐zero loadings are concentrated on the lower‐order factors. Simultaneously, individual element sparsity is achieved through the use of an adaptive LASSO. In conjunction with using an information criterion which promotes aggressive shrinkage, simulation shows that the OFAL penalty performs strongly compared with standard methods and penalties for order selection, achieving sparsity, and prediction in GLLVMs. Applying the OFAL penalty to the Southern Ocean marine species dataset suggests the available environmental predictors explain roughly half of the total covariation between species, thus leading to a smaller number of latent variables and increased sparsity in the loading matrix compared to a model without any covariates.