The minimizer Jaccard estimator is biased and inconsistent
Author(s) -
Mahdi Belbasi,
Antonio Blanca,
Robert S. Harris,
David Koslicki,
Paul Medvedev
Publication year - 2022
Publication title -
bioinformatics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.599
H-Index - 390
eISSN - 1367-4811
pISSN - 1367-4803
DOI - 10.1093/bioinformatics/btac244
Subject(s) - jaccard index , estimator , computer science , limit (mathematics) , scalability , algorithm , scripting language , data mining , statistics , mathematics , pattern recognition (psychology) , artificial intelligence , mathematical analysis , database , operating system
Sketching is now widely used in bioinformatics to reduce data size and increase data processing speed. Sketching approaches entice with improved scalability but also carry the danger of decreased accuracy and added bias. In this article, we investigate the minimizer sketch and its use to estimate the Jaccard similarity between two sequences.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom