Safe and Complete Contig Assembly Through Omnitigs | Zendy

Alexandru I. Tomescu | Zendy; Paul Medvedev | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Safe and Complete Contig Assembly Through Omnitigs

Author(s) -

Alexandru I. Tomescu,

Paul Medvedev

Publication year - 2016

Publication title -

journal of computational biology

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.585

H-Index - 95

eISSN - 1557-8666

pISSN - 1066-5277

DOI - 10.1089/cmb.2016.0141

Subject(s) - contig , de bruijn graph , genome , sequence assembly , computer science , set (abstract data type) , string (physics) , biology , mathematics , graph , theoretical computer science , combinatorics , computational biology , genetics , gene , programming language , gene expression , transcriptome , mathematical physics

Contig assembly is the first stage that most assemblers solve when reconstructing a genome from a set of reads. Its output consists of contigs-a set of strings that are promised to appear in any genome that could have generated the reads. From the introduction of contigs 20 years ago, assemblers have tried to obtain longer and longer contigs, but the following question remains: given a genome graph G (e.g., a de Bruijn, or a string graph), what are all the strings that can be safely reported from G as contigs? In this article, we answer this question using a model in which the genome is a circular covering walk. We also give a polynomial-time algorithm to find such strings, which we call omnitigs. Our experiments show that omnitigs are 66%-82% longer on average than the popular unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in unitigs.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research