Premium
Scalable estimation and regularization for the logistic normal multinomial model
Author(s) -
Zhang Jingru,
Lin Wei
Publication year - 2019
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/biom.13071
Subject(s) - regularization (linguistics) , multinomial distribution , computer science , estimation , scalability , multinomial logistic regression , statistics , econometrics , mathematics , mathematical optimization , artificial intelligence , machine learning , economics , management , database
Clustered multinomial data are prevalent in a variety of applications such as microbiome studies, where metagenomic sequencing data are summarized as multinomial counts for a large number of bacterial taxa per subject. Count normalization with ad hoc zero adjustment tends to result in poor estimates of abundances for taxa with zero or small counts. To account for heterogeneity and overdispersion in such data, we suggest using the logistic normal multinomial (LNM) model with an arbitrary correlation structure to simultaneously estimate the taxa compositions by borrowing information across subjects. We overcome the computational difficulties in high dimensions by developing a stochastic approximation EM algorithm with Hamiltonian Monte Carlo sampling for scalable parameter estimation in the LNM model. The ill‐conditioning problem due to unstructured covariance is further mitigated by a covariance‐regularized estimator with a condition number constraint. The advantages of the proposed methods are illustrated through simulations and an application to human gut microbiome data.