CompMap: a reference-based compression program to speed up read mapping to related reference sequences
Author(s) -
Zexuan Zhu,
Linsen Li,
Yongpeng Zhang,
Yanli Yang,
Xiao Yang
Publication year - 2014
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btu656
Subject(s) - computer science , reference genome , process (computing) , sequence (biology) , set (abstract data type) , task (project management) , data mining , speedup , reference model , data compression , software , dna sequencing , algorithm , parallel computing , programming language , software engineering , biology , dna , genetics , management , economics
Exhaustive mapping of next-generation sequencing data to a set of relevant reference sequences becomes an important task in pathogen discovery and metagenomic classification. However, the runtime and memory usage increase as the number of reference sequences and the repeat content among these sequences increase. In many applications, read mapping time dominates the entire application. We developed CompMap, a reference-based compression program, to speed up this process. CompMap enables the generation of a non-redundant representative sequence for the input sequences. We have demonstrated that reads can be mapped to this representative sequence with a much reduced time and memory usage, and the mapping to the original reference sequences can be recovered with high accuracy.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom