
Penalized Regression for Multiple Types of Many Features With Missing Data
Author(s) -
Kin Yau Wong,
Donglin Zeng,
Danyu Lin
Publication year - 2023
Publication title -
statistica sinica
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.24
H-Index - 77
eISSN - 1996-8507
pISSN - 1017-0405
DOI - 10.5705/ss.202020.0401
Subject(s) - estimator , missing data , computer science , feature selection , variable (mathematics) , latent variable , expectation–maximization algorithm , regression , maximization , data type , sample size determination , regression analysis , data mining , statistics , artificial intelligence , maximum likelihood , machine learning , mathematics , mathematical optimization , mathematical analysis , programming language
Recent technological advances have made it possible to measure multiple types of many features in biomedical studies. However, some data types or features may not be measured for all study subjects because of cost or other constraints. We use a latent variable model to characterize the relationships across and within data types and to infer missing values from observed data. We develop a penalized-likelihood approach for variable selection and parameter estimation and devise an efficient expectation-maximization algorithm to implement our approach. We establish the asymptotic properties of the proposed estimators when the number of features increases at a polynomial rate of the sample size. Finally, we demonstrate the usefulness of the proposed methods using extensive simulation studies and provide an application to a motivating multi-platform genomics study.