z-logo
Premium
Compact inverted index storage using general‐purpose compression libraries
Author(s) -
Petri Matthias,
Moffat Alistair
Publication year - 2018
Publication title -
software: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.437
H-Index - 70
eISSN - 1097-024X
pISSN - 0038-0644
DOI - 10.1002/spe.2556
Subject(s) - computer science , implementation , inverted index , decoding methods , compression (physics) , index (typography) , data compression , information retrieval , key (lock) , task (project management) , keyword search , data mining , algorithm , world wide web , operating system , search engine indexing , software engineering , materials science , management , economics , composite material
Summary Efficient storage of large inverted indexes is one of the key technologies that support current web search services. Here we re‐examine mechanisms for representing document‐level inverted indexes and within‐document term frequencies, including comparing specialized methods developed for this task against recent fast implementations of general‐purpose adaptive compression techniques. Experiments with the Gov2‐URL collection and a large collection of crawled news stories show that standard compression libraries can provide compression effectiveness as good as or better than previous methods, with decoding rates only moderately slower than reference implementations of those tailored approaches. This surprising outcome means that high‐performance index compression can be achieved without requiring the use of specialized implementations.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here