CSRBoost: Clustered Sampling with Resampling Boosting for Imbalanced Dataset Pattern Classification
Author(s) -
Seema Yadav,
Dhruvanshu Joshi,
Soham Mulye,
Labib Asari,
Sandeep S. Udmale,
Girish P. Bhole
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3616207
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Data mining and machine learning (DL&ML) approaches frequently face class imbalance (CI) issues, especially in binary classification tasks when one class significantly outnumbers the other. Due to their propensity to favor the majority class, traditional DL & ML methods may exhibit biases due to insufficient oversampling and subpar performance in identifying instances of the minority class where minority classes often carry critical importance. It consequently raises concern regarding algorithmic fairness. To address CI difficulties, it is essential to increase the ability to identify different discriminatory patterns in the data by creating a large number of test cases. The proposed approach aims to achieve more fair and unbiased model performance. We provide CSRBoost, an ensemble learning method to tackle the CI problem. Three essential methods are combined in CSRBoost: AdaBoost, undersampling, and oversampling. To improve the model’s capacity for generalization and provide a representative and balanced dataset, this approach provides the dynamic adjustments of clusters for sufficient granularity of the majority class in a controlled manner. It holds dataset-relevant structure information. Thus, the various horizons within the datasets remain unbroken. Besides, SMOTE and AdaBoost allow the model to adapt to complex boundaries by enhancing data diversity and minority class representation. CSRBoost is a dependable solution for CI data found in real-world classification tasks, as evidenced by its enhanced performance in handling imbalanced datasets.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom