Psychosocial Features for Hate Speech Detection in Code-switched Texts | Zendy

Edward Ombui | Zendy; Lawrence Muchemi | Zendy; Peter Waiganjo Wagacha | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Psychosocial Features for Hate Speech Detection in Code-switched Texts

Author(s) -

Edward Ombui,

Lawrence Muchemi,

Peter Waiganjo Wagacha

Publication year - 2021

Publication title -

international journal of information technology and computer science

Language(s) - English

Resource type - Journals

eISSN - 2074-9015

pISSN - 2074-9007

DOI - 10.5815/ijitcs.2021.06.03

Subject(s) - computer science , natural language processing , feature (linguistics) , latent dirichlet allocation , preprocessor , set (abstract data type) , artificial intelligence , topic model , identification (biology) , code (set theory) , recall , speech recognition , linguistics , psychology , cognitive psychology , philosophy , botany , biology , programming language

This study examines the problem of hate speech identification in codeswitched text from social media using a natural language processing approach. It explores different features in training nine models and empirically evaluates their predictiveness in identifying hate speech in a ~50k human-annotated dataset. The study espouses a novel approach to handle this challenge by introducing a hierarchical approach that employs Latent Dirichlet Analysis to generate topic models that help build a high-level Psychosocial feature set that we acronym PDC. PDC groups similar meaning words in word families, which is significant in capturing codeswitching during the preprocessing stage for supervised learning models. The high-level PDC features generated are based on a hate speech annotation framework [1] that is largely informed by the duplex theory of hate [2]. Results obtained from frequency-based models using the PDC feature on the dataset comprising of tweets generated during the 2012 and 2017 presidential elections in Kenya indicate an f-score of 83% (precision: 81%, recall: 85%) in identifying hate speech. The study is significant in that it publicly shares a unique codeswitched dataset for hate speech that is valuable for comparative studies. Secondly, it provides a methodology for building a novel PDC feature set to identify nuanced forms of hate speech, camouflaged in codeswitched data, which conventional methods could not adequately identify.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research