Premium
TWO‐SIGMA: A novel two‐component single cell model‐based association method for single‐cell RNA‐seq data
Author(s) -
Van Buren Eric,
Hu Ming,
Weng Chen,
Jin Fulai,
Li Yan,
Wu Di,
Li Yun
Publication year - 2021
Publication title -
genetic epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.301
H-Index - 98
eISSN - 1098-2272
pISSN - 0741-0395
DOI - 10.1002/gepi.22361
Subject(s) - sigma , covariate , negative binomial distribution , statistics , random effects model , mathematics , computer science , algorithm , medicine , physics , meta analysis , quantum mechanics , poisson distribution
In this paper, we develop TWO-SIGMA, a TWO-component SInGle cell Model-based Association method for differential expression (DE) analyses in single-cell RNA-seq (scRNA-seq) data. The first component models the probability of "drop-out" with a mixed-effects logistic regression model and the second component models the (conditional) mean expression with a mixed-effects negative binomial regression model. TWO-SIGMA is extremely flexible in that it: (i) does not require a log-transformation of the outcome, (ii) allows for overdispersed and zero-inflated counts, (iii) accommodates a correlation structure between cells from the same individual via random effect terms, (iv) can analyze unbalanced designs (in which the number of cells does not need to be identical for all samples), (v) can control for additional sample-level and cell-level covariates including batch effects, (vi) provides interpretable effect size estimates, and (vii) enables general tests of DE beyond two-group comparisons. To our knowledge, TWO-SIGMA is the only method for analyzing scRNA-seq data that can simultaneously accomplish each of these features. Simulations studies show that TWO-SIGMA outperforms alternative regression-based approaches in both type-I error control and power enhancement when the data contains even moderate within-sample correlation. A real data analysis using pancreas islet single-cells exhibits the flexibility of TWO-SIGMA and demonstrates that incorrectly failing to include random effect terms can have dramatic impacts on scientific conclusions. TWO-SIGMA is implemented in the R package twosigma available at https://github.com/edvanburen/twosigma.