A sparse negative binomial mixture model for clustering RNA-seq count data | Zendy

Yujia Li | Zendy; Tahminur Rahman | Zendy; Tianzhou Ma | Zendy; Lu Tang | Zendy; George C. Tseng | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A sparse negative binomial mixture model for clustering RNA-seq count data

Author(s) -

Yujia Li,

Tahminur Rahman,

Tianzhou Ma,

Lu Tang,

George C. Tseng

Publication year - 2021

Publication title -

biostatistics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 3.493

H-Index - 82

eISSN - 1468-4357

pISSN - 1465-4644

DOI - 10.1093/biostatistics/kxab025

Subject(s) - count data , feature selection , mixture model , lasso (programming language) , cluster analysis , negative binomial distribution , computer science , model selection , bayesian information criterion , bayesian inference , bayesian probability , inference , feature (linguistics) , pattern recognition (psychology) , data mining , artificial intelligence , mathematics , statistics , poisson distribution , linguistics , philosophy , world wide web

Clustering with variable selection is a challenging yet critical task for modern small-n-large-p data. Existing methods based on sparse Gaussian mixture models or sparse $K$-means provide solutions to continuous data. With the prevalence of RNA-seq technology and lack of count data modeling for clustering, the current practice is to normalize count expression data into continuous measures and apply existing models with a Gaussian assumption. In this article, we develop a negative binomial mixture model with lasso or fused lasso gene regularization to cluster samples (small $n$) with high-dimensional gene features (large $p$). A modified EM algorithm and Bayesian information criterion are used for inference and determining tuning parameters. The method is compared with existing methods using extensive simulations and two real transcriptomic applications in rat brain and breast cancer studies. The result shows the superior performance of the proposed count data model in clustering accuracy, feature selection, and biological interpretation in pathways.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore