
Fast, Flexible Text Search Using Genomic Short‐Read Mapping Model
Author(s) -
Kim SungHwan,
Cho HwanGue
Publication year - 2016
Publication title -
etri journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.295
H-Index - 46
eISSN - 2233-7326
pISSN - 1225-6463
DOI - 10.4218/etrij.16.0115.0594
Subject(s) - computer science , locality , information retrieval , task (project management) , heuristic , tracing , process (computing) , data mining , fragment (logic) , artificial intelligence , algorithm , philosophy , linguistics , management , economics , operating system
The searching of an extensive document database for documents that are locally similar to a given query document, and the subsequent detection of similar regions between such documents, is considered as an essential task in the fields of information retrieval and data management. In this paper, we present a framework for such a task. The proposed framework employs the method of short‐read mapping, which is used in bioinformatics to reveal similarities between genomic sequences. In this paper, documents are considered biological objects; consequently, edit operations between locally similar documents are viewed as an evolutionary process. Accordingly, we are able to apply the method of evolution tracing in the detection of similar regions between documents. In addition, we propose heuristic methods to address issues associated with the different stages of the proposed framework, for example, a frequency‐based fragment ordering method and a locality‐aware interval aggregation method. Extensive experiments covering various scenarios related to the search of an extensive document database for documents that are locally similar to a given query document are considered, and the results indicate that the proposed framework outperforms existing methods.