MapReduce for accurate error correction of next-generation sequencing data
Author(s) -
Liang Zhao,
Qingfeng Chen,
WenCui Li,
Peng Jiang,
Limsoon Wong,
Jinyan Li
Publication year - 2017
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btx089
Subject(s) - computer science , error detection and correction , sequence (biology) , position (finance) , word error rate , source code , data mining , cloud computing , code (set theory) , layer (electronics) , set (abstract data type) , algorithm , artificial intelligence , biology , programming language , finance , economics , genetics , operating system , chemistry , organic chemistry
Next-generation sequencing platforms have produced huge amounts of sequence data. This is revolutionizing every aspect of genetic and genomic research. However, these sequence datasets contain quite a number of machine-induced errors-e.g. errors due to substitution can be as high as 2.5%. Existing error-correction methods are still far from perfect. In fact, more errors are sometimes introduced than correct corrections, especially by the prevalent k-mer based methods. The existing methods have also made limited exploitation of on-demand cloud computing.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom