z-logo
open-access-imgOpen Access
Transcriptome diversity is a systematic source of variation in RNA-sequencing data
Author(s) -
Pablo García-Nieto,
Ban Wang,
Hunter B. Fraser
Publication year - 2022
Publication title -
plos computational biology/plos computational biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.628
H-Index - 182
eISSN - 1553-7358
pISSN - 1553-734X
DOI - 10.1371/journal.pcbi.1009939
Subject(s) - transcriptome , biology , rna seq , computational biology , genetics , variation (astronomy) , metric (unit) , gene , gene expression , evolutionary biology , physics , operations management , astrophysics , economics
RNA sequencing has been widely used as an essential tool to probe gene expression. While standard practices have been established to analyze RNA-seq data, it is still challenging to interpret and remove artifactual signals. Several biological and technical factors such as sex, age, batches, and sequencing technology have been found to bias these estimates. Probabilistic estimation of expression residuals (PEER), which infers broad variance components in gene expression measurements, has been used to account for some systematic effects, but it has remained challenging to interpret these PEER factors. Here we show that transcriptome diversity–a simple metric based on Shannon entropy–explains a large portion of variability in gene expression and is the strongest known factor encoded in PEER factors. We then show that transcriptome diversity has significant associations with multiple technical and biological variables across diverse organisms and datasets. In sum, transcriptome diversity provides a simple explanation for a major source of variation in both gene expression estimates and PEER covariates.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here