Efficient top-k algorithms for approximate substring matching | Zendy

Younghoon Kim | Zendy; Kyuseok Shim | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Efficient top-k algorithms for approximate substring matching

Author(s) -

Younghoon Kim,

Kyuseok Shim

Publication year - 2013

Publication title -

citeseer x (the pennsylvania state university)

Language(s) - English

Resource type - Conference proceedings

DOI - 10.1145/2463676.2465324

Subject(s) - substring , matching (statistics) , computer science , range (aeronautics) , similarity (geometry) , algorithm , string searching algorithm , pattern matching , theoretical computer science , mathematics , artificial intelligence , data structure , statistics , programming language , materials science , image (mathematics) , composite material

There is a wide range of applications that require to query a large database of texts to search for similar strings or substrings. Traditional approximate substring matching requests a user to specify a similarity threshold. Without top-k approximate substring matching, users have to try repeatedly different maximum distance threshold values when the proper threshold is unknown in advance. In our paper, we first propose the efficient algorithms for finding the top-k approximate substring matches with a given query string in a set of data strings. To reduce the number of expensive distance computations, the proposed algorithms utilize our novel filtering techniques which take advantages of q-grams and inverted q-gram indexes available. We conduct extensive experiments with real-life data sets. Our experimental results confirm the effectiveness and scalability of our proposed algorithms.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research