z-logo
open-access-imgOpen Access
Ensemble-based Semi-Supervised Learning for Hate Speech Detection
Author(s) -
Safa Alsafari,
Samira Sadaoui
Publication year - 2021
Publication title -
proceedings of the ... international florida artificial intelligence research society conference
Language(s) - English
Resource type - Journals
eISSN - 2334-0762
pISSN - 2334-0754
DOI - 10.32473/flairs.v34i1.128427
Subject(s) - leverage (statistics) , computer science , ensemble learning , artificial intelligence , labeled data , voice activity detection , machine learning , supervised learning , natural language processing , speech recognition , speech processing , artificial neural network
Large and accurately labeled textual corpora are vital to developing efficient hate speech classifiers. This paper introduces an ensemble-based semi-supervised learning approach to leverage the availability of abundant social media content. Starting with a reliable hate speech dataset, we train and test diverse classifiers that are then used to label a corpus of one million tweets. Next, we investigate several strategies to select the most confident labels from the obtained pseudo labels. We assess these strategies by re-training all the classifiers with the seed dataset augmented with the trusted pseudo-labeled data. Finally, we demonstrate that our approach improves classification performance over supervised hate speech classification methods.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here