Sample Denoising and Optimization Technique Based on Noise Filtering and Evolutionary Algorithms for Imbalanced Data Classification | Zendy

Fhira Nhita | Zendy; Asniar | Zendy; Isman Kurniawan | Zendy; Adiwijaya | Zendy

Open Access

Sample Denoising and Optimization Technique Based on Noise Filtering and Evolutionary Algorithms for Imbalanced Data Classification

Author(s) -

Fhira Nhita,

Asniar,

Isman Kurniawan,

Adiwijaya

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3573786

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Imbalanced data remains a challenge in classification research and significantly influences classifier performance. The strategy that is widely used to address this issue is the data-level approach or sampling method, through over-sampling, under-sampling, or hybrid-sampling methods. However, data quality problems, such as the presence of noise disrupt the sampling process and adversely affect classifier performance, particularly in the popular over-sampling method, such as Synthetic Minority over-sampling Technique (SMOTE). Therefore, the data preprocessing strategy at the before and after data balancing process is crucial to improve data quality before the classification process is conducted. This study proposes a method to improve the data balancing process by integrating two preprocessing steps with the SMOTE sampling technique. Technically, we performed a sample denoising process with Tomek links before applying the SMOTE and then followed by sample optimization with an evolutionary algorithm after the SMOTE. A genetic algorithm (GA) as one of the popular evolutionary algorithms is utilized for sample optimization including synthetic samples of SMOTE and original samples from both classes. Then, the selected train set is used to develop classification model using five classifier, i.e., decision tree, logistic regression, support vector machine, k-nearest neighbors, and naive bayes. Experimental results and statistical evaluations on 24 real-world imbalanced datasets demonstrate that our proposed method Tomek-SMOTE-GA (TSGA) is significantly better than baseline and state-of-the-art sampling methods in term of geometric-mean, particularly when using decision tree classifiers.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore