Premium
Revisiting bounded context block‐sorting transformations
Author(s) -
Culpepper J. Shane,
Petri Matthias,
Puglisi Simon J.
Publication year - 2012
Publication title -
software: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.437
H-Index - 70
eISSN - 1097-024X
pISSN - 0038-0644
DOI - 10.1002/spe.1112
Subject(s) - context (archaeology) , computer science , lexicographical order , algorithm , sorting , permutation (music) , transformation (genetics) , string (physics) , bounded function , matrix (chemical analysis) , process (computing) , block (permutation group theory) , parallel computing , arithmetic , theoretical computer science , mathematics , combinatorics , paleontology , mathematical analysis , biochemistry , mathematical physics , biology , chemistry , physics , materials science , composite material , acoustics , gene , operating system
SUMMARY The Burrows–Wheeler Transform ( BWT ) produces a permutation of a string X , denoted X ∗ , by sorting the n cyclic rotations of X into full lexicographical order and taking the last column of the resulting n × n matrix to be X ∗ . The transformation is reversible in O ( n ) time. In this paper, we consider an alteration to the process, called k ‐ BWT , where rotations are only sorted to a depth k . We propose new approaches to the forward and reverse transform, and show that the methods are efficient in practice. More than a decade ago, two algorithms were independently discovered for reversing k ‐ BWT , both of which run in O ( nk ) time. Two recent algorithms have lowered the bounds for the reverse transformation to O ( n log k ) and O ( n ) , respectively. We examine the practical performance for these reversal algorithms. We find that the original O ( nk ) approach is most efficient in practice, and investigates new approaches, aimed at further speeding reversal, which store precomputed context boundaries in the compressed file. By explicitly encoding the context boundaries, we present an O ( n ) reversal technique that is both efficient and effective. Finally, our study elucidates an inherently cache‐friendly – and hitherto unobserved – behavior in the reverse k ‐ BWT , which could lead to new applications of the k ‐ BWT transform. In contrast to previous empirical studies, we show that the partial transform can be reversed significantly faster than the full transform, without significantly affecting compression effectiveness. Copyright © 2011 John Wiley & Sons, Ltd.