Premium
Dictionary lookup with one genome evolution operation
Author(s) -
Zhang Meng,
Zhang Yi,
Sun Yuming
Publication year - 2020
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.5840
Subject(s) - substring , transposition (logic) , string (physics) , edit distance , computer science , combinatorics , constant (computer programming) , range (aeronautics) , space (punctuation) , mathematics , algorithm , data structure , artificial intelligence , materials science , composite material , mathematical physics , programming language , operating system
Summary Given an m ‐length query string q , approximate dictionary lookup searches for strings in a string dictionary D at a distance of 1 to q under some distances. In biological retrieval systems, the distances in such queries are defined by evolution operations on genomes. We consider the approximate dictionary lookup with one genome evolution operation including reversal and transposition, which searches for strings in D that can be generated from q by one reversal or one transposition. When the length of the reversed substring is confined to a constant α >1, we propose an O ( m )‐time approach which uses O ( ( | D | − α d ) log | D | ) bits space, where the dictionary D has d strings with totally | D | symbols. If the lengths of the reversals are in a range [ α , β ], the time for query is O ( ( β − α + 1 ) m log log | D | + occ ) , and the space is O ( | D | log ε ( | D | ) ) words for any constant ε , in which occ is the number of matched strings. For problems allowing one transposition, when the length of the transposition is fixed to α , the time for a dictionary lookup is O ( α m log log | D | + o c c ) , while using O ( | D | log ε ( | D | ) ) words. In the case that the two swapped substrings are of the same length, the time for answering the query is O ( m ), while the space is O ( ( | D | − α d ) log | D | ) bits.