
Auto-Off ID: Automatic Detection of Offensive Language in Social Media
Author(s) -
R. Geetha,
S. Karthika,
Chaluvadi Jwala Sowmika,
Bharathi M. Janani
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1911/1/012012
Subject(s) - offensive , computer science , machine learning , artificial intelligence , social media , language identification , popularity , support vector machine , identification (biology) , the internet , natural language processing , world wide web , natural language , psychology , engineering , social psychology , botany , operations research , biology
As the popularity of social media grows, computer-mediated anonymity allows users to engage in activities that they would not do in real life. This makes users vulnerable to abuse through Internet platforms. Due to the enormous number of social media data, it is not possible to manually filter out the overflow of abusive content in online communities and social networking sites. The research work proposes a multi-level classification model that deploys various machine and deep learning models to effectively identify offensive content in a tweet. The proposed Auto-Off ID system is designed to build a system that classifies tweets as offensive or non-offensive; filters out and classifies offensive tweets as either targeted or non-targeted; filters out targeted tweets and identify mentions of individuals and organizations who have been bullied. The study is supported by the text analysis features with lexicon features using LIWC, POS tags for primary and secondary users, Twitter Tag Scores (TTS). This system is evaluated using a diverse choice of machine learning and deep learning models from which it is proved that C-LSTM outperform with an accuracy of 91.72% for offensive language identification; LDA + Logistic Regression training with SVM accuracy of 90.87% for offensive tweet classification.