Optimized Breast Cancer Classification Via SHAP-Based Feature Selection and Weighted Voting Ensemble Learning
Author(s) -
Jungpil Shin,
Najmul Hassan,
Abu Saleh Musa Miah,
Taro Suzuki,
Sultan Al Farhood
Publication year - 2025
Publication title -
ieee open journal of the computer society
Language(s) - English
Resource type - Magazines
eISSN - 2644-1268
DOI - 10.1109/ojcs.2025.3610146
Subject(s) - computing and processing
Breast cancer (BC) is among the most common cancers affecting women worldwide, highlighting the urgent need for early and accurate diagnosis. Machine learning (ML) has emerged as a powerful tool for BC classification, enhancing diagnostic precision and improving patient outcomes. However, ML-based diagnostic systems often struggle to identify the most relevant features, particularly in moderately dimensional datasets containing redundant or non-informative attributes. To address this, we propose an optimized ensemble learning model that integrates SHapley Additive exPlanations (SHAP) based feature selection with a weighted soft-voting ensemble classifier. SHAP provides a model-agnostic, theoretically grounded approach for identifying the most influential features by quantifying their contribution to model predictions. This not only improves interpretability but also reduces the feature space without compromising performance. By selecting the top 15 most important features, the model achieves higher efficiency and clearer clinical insight, both essential for medical decision-making. The ensemble combines Extra Trees (ET), LightGBM, XGBoost (XGB), and Support Vector Machine (SVM), leveraging the strengths of each. Our model achieves state-of-the-art performance on the Wisconsin Breast Cancer Diagnostic (WBCD) dataset, with 99.42% accuracy, 100% precision, and 98.44% recall. For the University of California, Irvine (UCI) preprocessed dataset, the model achieves 95.32% accuracy, 93.81% precision, and 99.07% recall, demonstrating robustness across different datasets. By uniting explainable AI (via SHAP) with an optimized ensemble strategy, our approach enhances both classification accuracy and model transparency, establishing it as a reliable and interpretable tool for early BC detection.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom