MULTILABEL OVER-SAMPLING AND UNDER-SAMPLING WITH CLASS ALIGNMENT FOR IMBALANCED MULTILABEL TEXT CLASSIFICATION | Zendy

Adil Yaseen Taha | Zendy; Sabrina Tiun | Zendy; Abdul Hadi Abd Rahman | Zendy; Ali Sabah | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

MULTILABEL OVER-SAMPLING AND UNDER-SAMPLING WITH CLASS ALIGNMENT FOR IMBALANCED MULTILABEL TEXT CLASSIFICATION

Author(s) -

Adil Yaseen Taha,

Sabrina Tiun,

Abdul Hadi Abd Rahman,

Ali Sabah

Publication year - 2021

Publication title -

journal of ict

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.217

H-Index - 10

eISSN - 2180-3862

pISSN - 1675-414X

DOI - 10.32890/jict2021.20.3.6

Subject(s) - overfitting , artificial intelligence , computer science , class (philosophy) , oversampling , sampling (signal processing) , machine learning , resampling , pattern recognition (psychology) , benchmark (surveying) , data mining , skewness , random forest , mathematics , statistics , artificial neural network , computer network , geodesy , bandwidth (computing) , filter (signal processing) , computer vision , geography

Simultaneous multiple labelling of documents, also known as multilabel text classification, will not perform optimally if the class is highly imbalanced. Class imbalanced entails skewness in the fundamental data for distribution that leads to more difficulty in classification. Random over-sampling and under-sampling are common approaches to solve the class imbalanced problem. However, these approaches have several drawbacks; the under-sampling is likely to dispose of useful data, whereas the over-sampling can heighten the probability of overfitting. Therefore, a new method that can avoid discarding useful data and overfitting problems is needed. This study proposes a method to tackle the class imbalanced problem by combining multilabel over-sampling and under-sampling with class alignment (ML-OUSCA). In the proposed ML-OUSCA, instead of using all the training instances, it draws a new training set by over-sampling small size classes and under-sampling big size classes. To evaluate our proposed ML-OUSCA, evaluation metrics of average precision, average recall and average F-measure on three benchmark datasets, namely, Reuters-21578, Bibtex, and Enron datasets, were performed. Experimental results showed that the proposed ML-OUSCA outperformed the chosen baseline random resampling approaches; K-means SMOTE and KNN-US. Thus, based on the results, we can conclude that designing a resampling method based on the class imbalanced together with class alignment will improve multilabel classification even better than just the random resampling method.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore