Word-Level vs Sentence-Level Language Identification: Application to Algerian and Arabic Dialects
Author(s) -
Mohamed Lichouri,
Mourad Abbas,
Abed Alhakim Freihat,
Dhiya El Hak Megtouf
Publication year - 2018
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2018.10.484
Subject(s) - computer science , natural language processing , artificial intelligence , sentence , arabic , identification (biology) , word (group theory) , naive bayes classifier , support vector machine , set (abstract data type) , speech recognition , linguistics , philosophy , botany , biology , programming language
In this paper, we investigate a set of methods for textual Arabic Dialect Identification, where we considered word-level and sentence-level approaches. We used three classifiers, namely: Linear Support Vector Machine L-SVM, Bernoulli Naive Bayes BNB and Multinomial Naive Bayes MNB. Then we combined them by using a voting procedure. We carried out experiments on two sets of dialects: the first one, PADIC, which consists of parallel sentences in Maghrebi and Middle Eastern dialects; and the second, a set of Algerian dialects only, that we built manually. For the Arabic dialects, we obtained an average accuracy of 92%. For Algerian dialects, our approach yielded an average accuracy of about 76%.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom