Performance of Classification Algorithms Under Class Imbalance: Simulation and Real-World Evidence | Zendy

Iqra Arshad | Zendy; Muhammad Umair | Zendy; Faheem Jan | Zendy; Hasnain Iftikhar | Zendy; Paulo Canas Rodrigues | Zendy; Elias A. Torres Torres Armas | Zendy; Javier Linkolk Lopez-Gonzales | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Performance of Classification Algorithms Under Class Imbalance: Simulation and Real-World Evidence

Author(s) -

Iqra Arshad,

Muhammad Umair,

Faheem Jan,

Hasnain Iftikhar,

Paulo Canas Rodrigues,

Elias A. Torres Torres Armas,

Javier Linkolk Lopez-Gonzales

Publication year - 2025

Publication title -

ieee access

Language(s) - English

Resource type - Magazines

SCImago Journal Rank - 0.587

H-Index - 127

eISSN - 2169-3536

DOI - 10.1109/access.2025.3620264

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

Class imbalance is a persistent challenge in machine learning, particularly in high-stakes applications such as medical diagnostics, bioinformatics, and fraud detection, where the minority class often represents critical cases. While prior research has examined the effect of imbalance on classifier performance, little attention has been paid to establishing practical guidelines for the minimum proportion of minority samples required to achieve reliable sensitivity. In this study, we conduct extensive simulations using synthetic datasets and evaluate five widely used classification algorithms: Logistic Regression (Logit), Support Vector Machines (SVM), Random Forest, XGBoost, and Neural Networks (NNs). Our analysis reveals that logistic regression is more effective in identifying minority-class instances under an imbalanced class distribution in terms of F1 score and sensitivity, whereas Neural Network slightly performs better for a balanced-class distribution than logistic regression. Importantly, we identify a practical threshold for minority class representation: classifier sensitivity declines sharply when positive samples fall below approximately 25–30%. This finding is validated on eight real-world datasets, including large-scale applications, where Neural Networks and XGBoost demonstrate superior sensitivity. By establishing an actionable threshold, this study contributes practical guidance for dataset design and model selection in imbalanced classification problems.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research