APAROS:A Clustering-Based Hybrid approach for Handling Overlapped Regions in Imbalanced Datasets
Author(s) -
Annam Nandini,
Tapas Kumar Mishra,
Abinash Pujahari,
Sanket Mishra
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3614993
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
In many real-world datasets, the class distribution is often highly imbalanced, and minority class samples are located within the majority regions, leading to significant overlap. These challenges produce a model of bias and misclassifications, particularly for minority classes. To address these issues, we introduced a novel technique that integrates Affinity Propagation with Adaptive Synthetic Sampling (ADASYN) and Random Oversampling (ROS), collectively termed as APAROS. Using Affinity Propagation Clustering (APC), our proposed method categorizes the data set into Overlapping Clusters (OLC), Pure minority Clusters (PmC) and Pure Majority Clusters (PMC). Further, we applied existing oversampling methods; ADASYN in OLC and ROS in PmC, effectively balancing the dataset while minimizing the risk of miss-classification in overlapped regions. The proposed method was evaluated on ten benchmark imbalanced datasets and compared against existing resampling techniques such as ADASYN, ROS and Kmeans-SMOTE. Seven machine learning classifiers were employed to evaluate the models performance considering the metrics such as accuracy, precision, recall, f1_score, Matthews Correlation Coefficient (MCC) and Cohen’s Kappa coefficient. Experimental results consistently demonstrate that APAROS performs significantly better than traditional resampling techniques and produced superior classification performance, particularly in highly overlapped and imbalanced scenarios.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom