Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data | Zendy

Xuejun Liu | Zendy; Li Zhang | Zendy; Songcan Chen | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data

Author(s) -

Xuejun Liu,

Li Zhang,

Songcan Chen

Publication year - 2015

Publication title -

plos one

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.99

H-Index - 332

ISSN - 1932-6203

DOI - 10.1371/journal.pone.0140032

Subject(s) - overdispersion , poisson distribution , count data , computer science , markov chain monte carlo , prior probability , expression (computer science) , biology , computational biology , statistics , artificial intelligence , mathematics , bayesian probability , programming language

RNA-seq technology has become an important tool for quantifying the gene and transcript expression in transcriptome study. The two major difficulties for the gene and transcript expression quantification are the read mapping ambiguity and the overdispersion of the read distribution along reference sequence. Many approaches have been proposed to deal with these difficulties. A number of existing methods use Poisson distribution to model the read counts and this easily splits the counts into the contributions from multiple transcripts. Meanwhile, various solutions were put forward to account for the overdispersion in the Poisson models. By checking the similarities among the variation patterns of read counts for individual genes, we found that the count variation is exon-specific and has the conserved pattern across the samples for each individual gene. We introduce Gamma-distributed latent variables to model the read sequencing preference for each exon. These variables are embedded to the rate parameter of a Poisson model to account for the overdispersion of read distribution. The model is tractable since the Gamma priors can be integrated out in the maximum likelihood estimation. We evaluate the proposed approach, PGseq, using four real datasets and one simulated dataset, and compare its performance with other popular methods. Results show that PGseq presents competitive performance compared to other alternatives in terms of accuracy in the gene and transcript expression calculation and in the downstream differential expression analysis. Especially, we show the advantage of our method in the analysis of low expression.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research