z-logo
open-access-imgOpen Access
Towards Accurate Detection of Offensive Language in Online Communication in Arabic
Author(s) -
Azalden Alakrot,
Liam Murray,
Nikola S. Nikolov
Publication year - 2018
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2018.10.491
Subject(s) - computer science , offensive , arabic , classifier (uml) , natural language processing , artificial intelligence , support vector machine , variety (cybernetics) , speech recognition , linguistics , philosophy , management , economics
We present the results of predictive modelling for the detection of anti-social behaviour in online communication in Arabic, such as comments which contain obscene or offensive words and phrases. We collected and labelled a large dataset of YouTube comments in Arabic which contains a broad range of both offensive and inoffensive comments. We used this dataset to train a Support Vector Machine classifier and experimented with combinations of word-level features, N-gram features and a variety of pre-processing techniques. We summarise the pre-processing steps and features that allow training a classifier which is more precise, with 90.05% accuracy, than classifiers reported by previous studies on Arabic text.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom