z-logo
Premium
LZgrep: a Boyer–Moore string matching tool for Ziv–Lempel compressed text
Author(s) -
Navarro Gonzalo,
Tarhio Jorma
Publication year - 2005
Publication title -
software: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.437
H-Index - 70
eISSN - 1097-024X
pISSN - 0038-0644
DOI - 10.1002/spe.663
Subject(s) - computer science , unix , string searching algorithm , string (physics) , approximate string matching , matching (statistics) , pattern matching , simple (philosophy) , theoretical computer science , algorithm , programming language , software , mathematics , philosophy , statistics , epistemology , mathematical physics
Abstract We present a Boyer–Moore (BM) approach to string matching over LZ78 and LZW compressed text. The idea is to search the text directly in compressed form instead of decompressing and then searching it. We modify the BM approach so as to skip text using the characters explicitly represented in the LZ78/LZW formats, modifying the basic technique where the algorithm can choose which characters to inspect. We present and compare several solutions for single and multipattern searches. We show that our algorithms obtain speedups of up to 50% compared to the simple decompress‐then‐search approach. Finally, we present a public tool, LZgrep , which uses our algorithms to offer grep ‐like capabilities directly searching files compressed using Unix's Compress , a LZW compressor. LZgrep can also search files compressed with Unix gzip , using new decompress‐then‐search techniques we develop, which are faster than the current tools. This way, users can always keep their files in compressed form and still search them, uncompressing only when they want to see them. Copyright © 2005 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here