Minimally overlapping words for sequence similarity search
Author(s) -
Martin C. Frith,
Laurent Noé,
Grégory Kucherov
Publication year - 2020
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btaa1054
Subject(s) - computer science , similarity (geometry) , simple (philosophy) , sequence (biology) , seeding , software , artificial intelligence , boosting (machine learning) , sensitivity (control systems) , nearest neighbor search , pattern recognition (psychology) , algorithm , biology , genetics , image (mathematics) , epistemology , electronic engineering , agronomy , programming language , engineering , philosophy
Analysis of genetic sequences is usually based on finding similar parts of sequences, e.g. DNA reads and/or genomes. For big data, this is typically done via 'seeds': simple similarities (e.g. exact matches) that can be found quickly. For huge data, sparse seeding is useful, where we only consider seeds at a subset of positions in a sequence.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom