A Data-Aware FM-index
Author(s) -
Hongwei Huo,
Longgang Chen,
Heng Zhao,
Jeffrey Scott Vitter,
Yakov Nekrich,
Qiang Yu
Publication year - 2014
Publication title -
society for industrial and applied mathematics ebooks
Language(s) - English
Resource type - Book series
DOI - 10.1137/1.9781611973754.2
Subject(s) - search engine indexing , computer science , code (set theory) , index (typography) , algorithm , wavelet , data compression , entropy (arrow of time) , tree (set theory) , theoretical computer science , wavelet transform , data mining , artificial intelligence , mathematics , mathematical analysis , physics , set (abstract data type) , quantum mechanics , world wide web , programming language
In this paper we present some practical modifications of the higher-order entropy-compressed text indexing method of Foschini et al. [6] based upon the Burrows-Wheeler transform and the FM-index. Our method, called FM-Adaptive, applies a wavelet tree to the entire BWT. It partitions each bit vector of nodes in the wavelet tree into blocks and applies the hybrid encoding along with run-length Gamma code rather than the fixed-length code of [14] to each block while explores data-aware compression. FM-Adaptive retains the theoretical performance of previous work and introduces some improvements in practice. At the same time, broad experiments indicate that our index achieves superior performance, especially in terms of compression, in comparison to the state-of-the-art indexing techniques. The source code is available online.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom