EPGA: de novo assembly using the distributions of reads and insert size
Author(s) -
Junwei Luo,
Jianxin Wang,
Zhen Zhang,
FangXiang Wu,
Min Li,
Yi Pan
Publication year - 2014
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btu762
Subject(s) - contig , sequence assembly , hybrid genome assembly , de bruijn graph , computer science , sequence (biology) , insert (composites) , genome , de bruijn sequence , graph , extension (predicate logic) , reference genome , algorithm , computational biology , biology , theoretical computer science , genetics , mathematics , combinatorics , mechanical engineering , gene expression , transcriptome , gene , engineering , programming language
In genome assembly, the primary issue is how to determine upstream and downstream sequence regions of sequence seeds for constructing long contigs or scaffolds. When extending one sequence seed, repetitive regions in the genome always cause multiple feasible extension candidates which increase the difficulty of genome assembly. The universally accepted solution is choosing one based on read overlaps and paired-end (mate-pair) reads. However, this solution faces difficulties with regard to some complex repetitive regions. In addition, sequencing errors may produce false repetitive regions and uneven sequencing depth leads some sequence regions to have too few or too many reads. All the aforementioned problems prohibit existing assemblers from getting satisfactory assembly results.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom