Turkish Text Compression via Characters Encoding | Zendy

Tariq Abu Hilal | Zendy; Hasan Abu Hilal | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Turkish Text Compression via Characters Encoding

Author(s) -

Tariq Abu Hilal,

Hasan Abu Hilal

Publication year - 2020

Publication title -

procedia computer science

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.334

H-Index - 76

ISSN - 1877-0509

DOI - 10.1016/j.procs.2020.07.042

Subject(s) - computer science , encoding (memory) , turkish , compression (physics) , data compression , artificial intelligence , natural language processing , information retrieval , composite material , philosophy , linguistics , materials science

In this paper, we suggest an efficient conversion for Turkish character’s string, from UTF-8 to ANSI character’s coding for space-preserving. Likewise, we present a decoding method that transforms the encoded ANSI string back to its original format. Unlike the one-byte ANSI characters, some of the Turkish alphabets are being stored in 2 bytes size. All that space comes at a price. The developed sequential encoding technique will reduce the size of the text file. Moreover, the Turkish encoded text will retain its original form after decoding. According to our proposal, it is considered as a lossless text compression, where it’s a common concern today. Thus, many parties have become interested in Unicode compression. Basically, our algorithm is mapping Unicode Turkish characters into ANSI, by using the available 8-bit legacy.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research