z-logo
open-access-imgOpen Access
Text classification model for methamphetamine-related tweets in Southeast Asia using dual data preprocessing techniques
Author(s) -
Narongsak Chayangkoon,
Agnart Srivihok
Publication year - 2021
Publication title -
international journal of power electronics and drive systems/international journal of electrical and computer engineering
Language(s) - English
Resource type - Journals
eISSN - 2722-2578
pISSN - 2722-256X
DOI - 10.11591/ijece.v11i4.pp3617-3628
Subject(s) - c4.5 algorithm , computer science , feature selection , word2vec , support vector machine , naive bayes classifier , artificial intelligence , decision tree , data mining , machine learning , embedding
Methamphetamine addiction is a prominent problem in Southeast Asia. Drug addicts often discuss illegal activities on popular social networking services. These individuals spread messages on social media as a means of both buying and selling drugs online. This paper proposes a model, the “text classification model of methamphetamine tweets in Southeast Asia” (TMTA), to identify whether a tweet from Southeast Asia is related to methamphetamine abuse. The research addresses the weakness of bag of words (BoW) by introducing BoW and Word2Vec feature selection (BWF) techniques. A domain-based feature selection method was performed using the BoW dataset and Word2Vec. The BWF dataset provided a smaller number of features than the BoW and TF–IDF dataset. We experimented with three candidate classifiers: Support vector machine (SVM), decision tree (J48) and naive bayes (NB). We found that the J48 classifier with the BWF dataset provided the best performance for the TMTA in terms of accuracy (0.815), F-measure (0.818), Kappa (0.528), Matthews correlation coefficient (0.529) and high area under the ROC Curve (0.763). Moreover, TMTA provided the lowest runtime (3.480 seconds) using the J48 with the BWF dataset.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here