Computing all Repeats Using Suffix Arrays
Author(s) -
Frantisek Franek,
William F. Smyth,
Yudong Tang
Publication year - 2003
Publication title -
j. autom. lang. comb.
Language(s) - English
DOI - 10.25596/jalc-2003-579
We describe an algorithm that identifies all the repeating substrings (tandem, overlapping, and split) in a given string x = X[1..n]. Given the suffix arrays of x and of the reversed string x, the algorithm requires Θ(n) time for its execution and represents its output in Θ(n) space, either as a reduced suffix array (called an NE array) or as a reduced suffix tree (called an NE tree). The output substrings u are nonextendible (NE); that is, any extension of some occurrence of u in x, either to the left or to the right, yields a string (λu or uλ) that is unequal to the same extension of some other occurrence of u. Thus the number of substrings output is the minimum required to identify all the repeating substrings in x. The output can be used in a straightforward way to identify only repeating substrings that satisfy some proximity or minimum length condition.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom