
Text Language Identification Using Letters (Frequency, Self-information, and Entropy) Analysis for English, French, and German Languages
Author(s) -
Rasha Hassan Abbas,
Firas Abdul Elah Abdul Kareem
Publication year - 2019
Publication title -
xi'nan jiaotong daxue xuebao
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.308
H-Index - 21
ISSN - 0258-2724
DOI - 10.35741/issn.0258-2724.54.4.21
Subject(s) - computer science , german , categorization , natural language processing , the internet , chaining , artificial intelligence , linguistics , language identification , identification (biology) , world wide web , natural language , psychology , philosophy , psychotherapist , botany , biology
People illustrate the world, convey stories, share ideas, and interconnect in over 6900 languages. Information on the Internet may appear unlimited. All over history, electrical and computer experts have built tools such as telephone, telegraph and internet router, which have helped people communicate. Computer software that can translate between languages stands for one of such tools. The first step of translating a text is to categorize its language. In this research, self-identification program of text language was designed and tested depending on text letters (frequency, self-information, and entropy of certain chosen letters) for the English, French and German languages. The research, trying to detect the original language, is successful of detecting these languages, after applied to randomly selected text files. The detection program was written using C++ programming language.