An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile | Zendy

Celine Prakash | Zendy; Arndt von Haeseler | Zendy

Open Access

An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile

Author(s) -

Celine Prakash,

Arndt von Haeseler

Publication year - 2016

Publication title -

journal of computational biology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.585

H-Index - 95

eISSN - 1557-8666

pISSN - 1066-5277

DOI - 10.1089/cmb.2016.0096

Subject(s) - fragment (logic) , rna , fragmentation (computing) , computational biology , biology , population , algorithm , genetics , combinatorics , computer science , mathematics , gene , ecology , demography , sociology

RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research