z-logo
open-access-imgOpen Access
Study of Undersampling Method: Instance Hardness Threshold with Various Estimators for Hate Speech Classification
Author(s) -
Naufal Azmi Verdikha,
Teguh Bharata Adji,
Adhistya Erna Permanasari
Publication year - 2018
Publication title -
ijitee (international journal of information technology and electrical engineering)
Language(s) - English
Resource type - Journals
ISSN - 2550-0554
DOI - 10.22146/ijitee.42152
Subject(s) - undersampling , estimator , computer science , classifier (uml) , weighting , process (computing) , artificial intelligence , machine learning , data mining , statistics , mathematics , medicine , radiology , operating system
A text classification system is needed to address the problem of hate speech in social media. However, texts of hate speech are very hard to find in social media. This will make the distribution of training data to be unbalanced (imbalanced data). Classification with imbalanced data will make a poor performance. There are several methods to solve the problem of classification with imbalanced data. One of them is undersampling with Instance Hardness Threshold (IHT) method. IHT method balances the dataset by eliminating data that are frequently misclassified. To find those data, IHT requires an estimator, which is a classifier. This research aims to compare estimators of IHT method to solve imbalanced data problem in hate speech classification using TF-IDF weighting method. This research uses the class ratio of dataset after undersampling, time of the undersampling process, and Index of Balanced Accuracy (IBA) evaluation to determine the best IHT method. The results of this research show that IHT method using the Logistic Regression (IHT(LR)) has the fastest undersampling process (1.91 s), perfectly balance dataset with the class ratio is 1:1, and has the best of IBA evaluation in all estimation process. This result makes IHT(LR) be the best method to solve the imbalanced data problem in hate speech classification.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here