Preprocessed Text Compression Method for Malayalam Text Files
Author(s) -
Ms. Rincy T A,
R. Ramachandran
Publication year - 2019
Publication title -
international journal of recent technology and engineering (ijrte)
Language(s) - English
Resource type - Journals
ISSN - 2277-3878
DOI - 10.35940/ijrte.b1806.078219
Subject(s) - unicode , ascii , malayalam , computer science , character (mathematics) , byte , data compression , compression (physics) , preprocessor , natural language processing , artificial intelligence , speech recognition , programming language , mathematics , materials science , geometry , composite material
The increasing importance of Unicode for text files implies an increase in storage space required for data and the time for the transmission of data, with a corresponding need for compression of data. Conventional compressors fair purely on UTF-8 texts, where each character can span multiple bytes. Malayalam which is one among the four major languages of the Dravidian family, is represented by using Unicode characters. The contribution of this paper is a reversible transformation mapping of the input to reduce the actual size of the input file before a general purpose compression method. After the preprocessing, LZW compression achieves more compression to Malayalam text files containing any characters including ASCII characters. This method can be extended to any native language files containing mostly the characters of only one script.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom