z-logo
open-access-imgOpen Access
Psychosocial Features for Identifying Hate Speech in Social Media Text
Author(s) -
Edward Ombui,
Lawrence Muchemi,
Peter Waiganjo Wagacha
Publication year - 2021
Publication title -
journal of education, society and behavioural science
Language(s) - English
Resource type - Journals
ISSN - 2456-981X
DOI - 10.9734/jesbs/2021/v34i1230382
Subject(s) - latent dirichlet allocation , computer science , social media , feature (linguistics) , natural language processing , topic model , preprocessor , artificial intelligence , set (abstract data type) , speech recognition , linguistics , world wide web , philosophy , programming language
This study uses natural language processing to identify hate speech in social media codeswitched text. It trains nine models and tests their predictiveness in recognizing hate speech in a 50k human-annotated dataset. The article proposes a novel hierarchical approach that leverages Latent Dirichlet Analysis to develop topic models that assist build a high-level Psychosocial feature set we call PDC. PDC organizes words into word families, which helps capture codeswitching during preprocessing for supervised learning models. Informed by the duplex theory of hate, the PDC features are based on a hate speech annotation framework. Frequency-based models employing the PDC feature on tweets from the 2012 and 2017 Kenyan presidential elections yielded an f-score of 83 percent (precision: 81 percent, recall: 85 percent) in recognizing hate speech. The study is notable because it publicly exposes a rich codeswitched dataset for comparative studies. Second, it describes how to create a novel PDC feature set to detect subtle types of hate speech hidden in codeswitched data that previous approaches could not detect.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here