A fast bit-vector algorithm for approximate string matching based on dynamic programming
Author(s) -
Gene Myers
Publication year - 1998
Publication title -
lecture notes in computer science
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.249
H-Index - 400
eISSN - 1611-3349
pISSN - 0302-9743
DOI - 10.1007/bfb0030777
Subject(s) - algorithm , substring , computer science , subroutine , dynamic programming , string searching algorithm , matching (statistics) , bit array , matrix (chemical analysis) , set (abstract data type) , pattern matching , mathematics , type (biology) , composite material , biology , programming language , ecology , statistics , materials science , operating system
The approximate string matching problem is to find all locations at which a query of length m matches a substring of a text of length n with k-or-fewer differences. Simple and practical bit-vector algorithms have been designed for this problem, most notably the one used in agrep. These algorithms compute a bit representation of the current state-set of the k-difference automaton for the query, and asymptotically run in O(nmk/w) time where w is the word size of the machine (e.g. 32 or 64 in practice). Here we present an algorithm of comparable simplicity that requires only O(nm/w) time by virtue of computing a bit representation of the relocatable dynamic programming matrix for the problem. Thus the algorithm's performance is independent of k, and it is found to be more efficient than the previous results for many choices of k and small m.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom