z-logo
open-access-imgOpen Access
Word-Level vs Sentence-Level Language Identification: Application to Algerian and Arabic Dialects
Author(s) -
Mohamed Lichouri,
Mourad Abbas,
Abed Alhakim Freihat,
Dhiya El Hak Megtouf
Publication year - 2018
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2018.10.484
Subject(s) - computer science , natural language processing , artificial intelligence , sentence , arabic , identification (biology) , word (group theory) , naive bayes classifier , support vector machine , set (abstract data type) , speech recognition , linguistics , philosophy , botany , biology , programming language
In this paper, we investigate a set of methods for textual Arabic Dialect Identification, where we considered word-level and sentence-level approaches. We used three classifiers, namely: Linear Support Vector Machine L-SVM, Bernoulli Naive Bayes BNB and Multinomial Naive Bayes MNB. Then we combined them by using a voting procedure. We carried out experiments on two sets of dialects: the first one, PADIC, which consists of parallel sentences in Maghrebi and Middle Eastern dialects; and the second, a set of Algerian dialects only, that we built manually. For the Arabic dialects, we obtained an average accuracy of 92%. For Algerian dialects, our approach yielded an average accuracy of about 76%.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom