
Building a Dataset for Detecting Fake News in Amharic Language
Author(s) -
Tewodros Tazeze,
R. Raghavendra
Publication year - 2021
Publication title -
international journal of advanced research in science, communication and technology
Language(s) - English
Resource type - Journals
ISSN - 2581-9429
DOI - 10.48175/ijarsct-1362
Subject(s) - amharic , computer science , disinformation , social media , artificial intelligence , support vector machine , random forest , classifier (uml) , natural language processing , world wide web
The rapid growth and expansion of social media platform has filled the gap of information exchange in the day to day life. Apparently, social media is the main arena for disseminating manipulated information in a high range and exponential rate. The fabrication of twisted information is not limited to ones language, society and domain, this is particularly observed in the ongoing COVID-19 pandemic situation. The creation and propagation of fabricated news creates an urgent demand for automatically classification and detecting such distorted news articles. Manually detecting fake news is a laborious and tiresome task and the dearth of annotated fake news dataset to automate fake news detection system is still a tremendous challenge for low-resourced Amharic language (after Arabic, the second largely spoken Semitic language group). In this study, Amharic fake news dataset are crafted from verified news sources and various social media pages and six different machine learning classifiers Naïve bays, SVM, Logistic Regression, SGD, Random Forest and Passive aggressive Classifier model are built. The experimental results show that Naïve bays and Passive Aggressive Classifier surpass the remaining models with accuracy above 96% and F1- score of 99%. The study has a significant contribution to turn down the rate of disinformation in vernacular language.