Open Access
Lower Density Selection Schemes via Small Universal Hitting Sets with Short Remaining Path Length
Author(s) -
Hongyu Zheng,
Carl Kingsford,
Guillaume Marçais
Publication year - 2021
Publication title -
journal of computational biology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.585
H-Index - 95
eISSN - 1557-8666
pISSN - 1066-5277
DOI - 10.1089/cmb.2020.0432
Subject(s) - path (computing) , generalization , sequence (biology) , mathematics , string (physics) , set (abstract data type) , path length , scheme (mathematics) , selection (genetic algorithm) , upper and lower bounds , word (group theory) , mathematical optimization , combinatorics , computer science , mathematical analysis , geometry , mathematical physics , genetics , programming language , computer network , artificial intelligence , biology
Universal hitting sets (UHS) are sets of words that are unavoidable: every long enough sequence is hit by the set (i.e., it contains a word from the set). There is a tight relationship between UHS and minimizer schemes, where minimizer schemes with low density (i.e., efficient schemes) correspond to UHS of small size. Local schemes are a generalization of minimizer schemes that can be used as replacement for minimizer scheme with the possibility of being much more efficient. We establish the link between efficient local schemes and the minimum length of a string that must be hit by a UHS. We give bounds for the remaining path length of the Mykkeltveit UHS. In addition, we create a local scheme with the lowest known density that is only a log factor away from the theoretical lower bound.