
An Efficient Approach for Code-Mixed Emotion Classification applying Machine Learning
Author(s) -
Ahmad Mahmood,
Miguel Torres-Ruiz,
Zainab Ahmad,
Humaira Farid,
Iqra Ameer,
Rolando Quintero
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3598754
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Emotion classification aims to find and extract all possible emotions from a piece of text that best represent the author’s state of mind. The task of emotion classification is still considered challenging for under-resourced languages, especially in the case of code-mixing, which is not a standardized language on its own. The widespread use of social media has led to the emergence of code-mixed language, which later gained attention from researchers due to its extensive usage. Emotion classification is an important problem with a range of applications, from healthcare and e-learning to social media. While some work has been done on code-mixed emotion classification, very few studies have focused on code-mixed emotion classification for English and Roman Urdu. Previously, researchers attempted to solve the problem of code-mixed multi-label emotion classification using code-mixed English and Roman Urdu, but the results were relatively low (e.g., Micro F1 = 0.67), indicating that there is still a need for improvement in this area. In this study, we mainly aim to solve two complex tasks: (i) code-mixed multi-label emotion classification and (ii) code-mixed multi-class emotion classification. Our contribution lies in utilizing classical machine learning methods with three distinct multi-label and multi-class classification approaches: (i) One-Versus-Rest (OvR), (ii) Label Powerset (LP), and (iii) Binary Relevance (BR), along with two distinct feature extraction techniques. First, we employ content-based methods using TF-IDF at the word unigram level and experiment with various feature sets ranging from 500 to 3000 features. Second, we use context-based methods by leveraging SBERT-based models for embeddings to capture semantic meanings. Finally, we apply a state-of-the-art Generative AI-based approach, utilizing a quantized version of LLaMa, which is fine-tuned for evaluation. We conducted over 2,000 experiments, and the best results were obtained using classical machine learning (Micro F1 = 0.9142 for multi-label classification and Micro F1 = 0.9238 for multi-class classification) with the combination of the Binary Relevance approach in a context-based setting for both tasks, which indicates that Binary Relevance is an optimized approach for breaking complex multi-label, multi-class tasks into easier ones, especially when the language is difficult enough in its own.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom