z-logo
open-access-imgOpen Access
Turkish Text Compression via Characters Encoding
Author(s) -
Tariq Abu Hilal,
Hasan Abu Hilal
Publication year - 2020
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2020.07.042
Subject(s) - computer science , encoding (memory) , turkish , compression (physics) , data compression , artificial intelligence , natural language processing , information retrieval , composite material , philosophy , linguistics , materials science
In this paper, we suggest an efficient conversion for Turkish character’s string, from UTF-8 to ANSI character’s coding for space-preserving. Likewise, we present a decoding method that transforms the encoded ANSI string back to its original format. Unlike the one-byte ANSI characters, some of the Turkish alphabets are being stored in 2 bytes size. All that space comes at a price. The developed sequential encoding technique will reduce the size of the text file. Moreover, the Turkish encoded text will retain its original form after decoding. According to our proposal, it is considered as a lossless text compression, where it’s a common concern today. Thus, many parties have become interested in Unicode compression. Basically, our algorithm is mapping Unicode Turkish characters into ANSI, by using the available 8-bit legacy.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom