Premium
Imbalanced Learning with Oversampling based on Classification Contribution Degree
Author(s) -
Jiang Zhenhao,
Yang Jie,
Liu Yan
Publication year - 2021
Publication title -
advanced theory and simulations
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.068
H-Index - 17
ISSN - 2513-0390
DOI - 10.1002/adts.202100031
Subject(s) - oversampling , benchmark (surveying) , degree (music) , artificial intelligence , class (philosophy) , computer science , random forest , pattern recognition (psychology) , machine learning , statistical classification , geography , cartography , physics , computer network , bandwidth (computing) , acoustics
Imbalanced datasets exist commonly in the real world, which leads to poor performance of general machine learning models because of skewed class distribution. To address the data‐imbalance problem, a novel oversampling method based on classification contribution degree, called OS‐CCD is presented. First a new concept, classification contribution degree, is established based on micro and macro information extracted from raw datasets. With the classification contribution degree, OS‐CCD enables positive samples near the class boundary and located in an area with high density of positive samples to generate more synthetic samples than others. Furthermore, the neighbor selection for oversampling is no longer random but in the light of a selected probability. Experimental results on 12 benchmark datasets substantiate that four commonly used classifiers with the oversampling method outperform those with six popular oversampling methods in terms of accuracy, F1‐score and AUC.