z-logo
open-access-imgOpen Access
EFFICIENT PREPROCESSING FOR WEB LOG COMPRESSION
Author(s) -
Sebastian Deorowicz,
Szymon Grabowski
Publication year - 2014
Publication title -
computing
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.184
H-Index - 11
eISSN - 2312-5381
pISSN - 1727-6209
DOI - 10.47839/ijc.7.1.487
Subject(s) - web log analysis software , computer science , lossless compression , byte , data compression , transaction log , preprocessor , compression ratio , timestamp , upload , prefix , database , web server , data mining , operating system , the internet , algorithm , real time computing , artificial intelligence , web api , database transaction , automotive engineering , engineering , internal combustion engine , linguistics , philosophy
Web log files, storing user activity on a server, may grow at the pace of hundreds of megabytes a day, or even more, on popular sites. They are usually archived, as it enables further analysis, e.g., for detecting attacks or other server abuse patterns. In this work we present a specialized lossless Apache web log preprocessor and test it with combination of several popular general-purpose compressors. Our method works on individual fields of log data (each storing such information like the client’s IP, date/time, requested file or query, download size in bytes, etc.), and utilizes such compression techniques like finding and extracting common prefixes and suffixes, dictionary-based phrase sequence substitution, move-to-front coding, and more. The test results show the proposed transform improves the average compression ratios 2.70 times in case of gzip and 1.86 times in case of bzip2.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here