
TWO-SIGMA-G: a new competitive gene set testing framework for scRNA-seq data accounting for inter-gene and cell–cell correlation
Author(s) -
Eric Van Buren,
Ming Hu,
Cheng Liu,
John A. Wrobel,
Kirk C. Wilhelmsen,
Lishan Su,
Yun Le,
Di Wu
Publication year - 2022
Publication title -
briefings in bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.204
H-Index - 113
eISSN - 1477-4054
pISSN - 1467-5463
DOI - 10.1093/bib/bbac084
Subject(s) - sigma , computational biology , correlation , set (abstract data type) , data set , flexibility (engineering) , regression , type i and type ii errors , inference , statistical hypothesis testing , computer science , gene , regression analysis , biology , statistics , artificial intelligence , mathematics , genetics , machine learning , physics , geometry , quantum mechanics , programming language
We propose TWO-SIGMA-G, a competitive gene set test for scRNA-seq data. TWO-SIGMA-G uses a mixed-effects regression model based on our previously published TWO-SIGMA to test for differential expression at the gene-level. This regression-based model provides flexibility and rigor at the gene-level in (1) handling complex experimental designs, (2) accounting for the correlation between biological replicates and (3) accommodating the distribution of scRNA-seq data to improve statistical inference. Moreover, TWO-SIGMA-G uses a novel approach to adjust for inter-gene-correlation (IGC) at the set-level to control the set-level false positive rate. Simulations demonstrate that TWO-SIGMA-G preserves type-I error and increases power in the presence of IGC compared with other methods. Application to two datasets identified HIV-associated interferon pathways in xenograft mice and pathways associated with Alzheimer's disease progression in humans.