Premium
Adding compression to a full‐text retrieval system
Author(s) -
Zobel Justin,
Moffat Alistair
Publication year - 1995
Publication title -
software: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.437
H-Index - 70
eISSN - 1097-024X
pISSN - 0038-0644
DOI - 10.1002/spe.4380250804
Subject(s) - huffman coding , computer science , compression (physics) , information retrieval , data compression , compression ratio , coding (social sciences) , text retrieval , natural language processing , artificial intelligence , mathematics , statistics , materials science , automotive engineering , engineering , composite material , internal combustion engine
We describe the implementation of a data compression scheme as an integral and transparent layer within a full‐text retrieval system. Using a semi‐static word‐based compression model, the space needed to store the text is under 30 per cent of the original requirement. The model is used in conjunction with canonical Huffman coding and together these two paradigms provide fast decompression. Experiments with 500 Mb of newspaper articles show that in full‐text retrieval environments compression not only saves space, it can also yield faster query processing ‐ a win‐win situation.