z-logo
open-access-imgOpen Access
Fast and Flexible Compression for Web Search Engines
Author(s) -
Antonio Fariña,
Nieves R. Brisaboa,
Cristina París,
José R. Paramá
Publication year - 2005
Publication title -
electronic notes in theoretical computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.242
H-Index - 60
ISSN - 1571-0661
DOI - 10.1016/j.entcs.2004.09.043
Subject(s) - computer science , compression (physics) , lossless compression , data compression , word (group theory) , compression ratio , code (set theory) , search engine , information retrieval , theoretical computer science , algorithm , programming language , mathematics , engineering , materials science , geometry , set (abstract data type) , automotive engineering , composite material , internal combustion engine
In this paper we present the adaptation of a compression technique, specially designed to compress large textual databases, to the peculiarities of web search engines.The (s,c)-Dense Code belongs to a new category of compression techniques [Silva de Moura, E., G. Navarro, N. Ziviani and R. Baeza-Yates, Fast and flexible word searching on compressed text, ACM Transactions on Information Systems 18 (2000), pp. 113–139; Brisaboa, N., A. Fariña, G. Navarro and M. Esteller, (s,c)-dense coding: An optimized compression code for natural language text databases, in: Proc. 10th International Symposium on String Processing and Information Retrieval (SPIRE 2003), LNCS 2857, 2003, pp. 122–136] that allows fast and flexible search directly on compressed files. However these methods are only suitable for large natural texts containing at least 1 megabyte, otherwise they would not achieve an attractive amount of compression.In order to take advantage of the search capabilities of these techniques (they allow searches on compressed files up to eight times faster than searching on the plain versions [Silva de Moura, E., G. Navarro, N. Ziviani and R. Baeza-Yates, Fast and flexible word searching on compressed text, ACM Transactions on Information Systems 18 (2000), pp. 113–139]), we present a modification of the basic compression technique (s,c)-Dense Code to achieve reasonable compression ratios with small files, a requirement when we work with search engines

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom