COSINE: non-seeding method for mapping long noisy sequences
Author(s) -
Pegah Tootoonchi Afshar,
Wing Hung Wong
Publication year - 2017
Publication title -
nucleic acids research
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 9.008
H-Index - 537
eISSN - 1362-4954
pISSN - 0305-1048
DOI - 10.1093/nar/gkx511
Subject(s) - cosine similarity , trigonometric functions , biology , similarity (geometry) , context (archaeology) , algorithm , discrete cosine transform , computer science , range (aeronautics) , pattern recognition (psychology) , sensitivity (control systems) , artificial intelligence , biological system , mathematics , engineering , electronic engineering , paleontology , geometry , image (mathematics) , aerospace engineering
Third generation sequencing (TGS) are highly promising technologies but the long and noisy reads from TGS are difficult to align using existing algorithms. Here, we present COSINE, a conceptually new method designed specifically for aligning long reads contaminated by a high level of errors. COSINE computes the context similarity of two stretches of nucleobases given the similarity over distributions of their short k-mers (k = 3-4) along the sequences. The results on simulated and real data show that COSINE achieves high sensitivity and specificity under a wide range of read accuracies. When the error rate is high, COSINE can offer substantial advantages over existing alignment methods.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom