
Dense Stack Meta Ensemble Classifier (DSMEC): An Advanced Stacking Ensemble Approach for Early Cardiovascular Disease Prediction with Robust Feature Engineering
Author(s) -
Dewan Ahmed Muhtasim,
Rishad Amin Pulok,
Md. Fahmidur Rahman Sakib,
Ruhul Amin,
Bushra Azmat Hussain,
Muhammad Muzammil,
Md. Mahfujul Hasan,
Siok Yee Tan
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3598285
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Cardiovascular disease remains a major global health issue requiring accurate early risk prediction to enable timely intervention. Traditional approaches often fail to capture complex feature interactions, limiting diagnostic accuracy. Advanced statistical methods, domain-specific feature engineering, and modern machine learning (ML) methods can significantly enhance prediction performance. This study proposes Dense Stack Meta-Ensemble Classifier (DSMEC), a two-tier advanced stacking ensemble model that leverages robust feature engineering on 70,000 demographic, clinical, and lifestyle Kaggle dataset to improve CVD risk prediction. DSMEC integrates eight optimized base classifiers, Random Forest (RF), Decision Tree (DT), Gradient Boosting (GB), LightGBM, XGBoost, CatBoost, Multilayer Perceptron (MLP), and Artificial Neural Network (ANN), fine-tuned using RandomizedSearchCV with StratifiedKFold cross-validation. A Deep Neural Network (DNN) meta-model aggregates their predictions to capture complex and non-linear data interactions. Missing value imputation, Kolmogorov–Smirnov test for distribution analysis, SMOTE for class balancing, and RobustScaler normalization are used for rigorous data preprocessing. Domain-specific features, including pulse pressure, mean arterial pressure, body mass index (BMI), and systolic-to-diastolic pressure ratio are generated, and the Mann–Whitney U test is utilized to assess their statistical significance. Further interaction-based features are generated using Chi-Squared tests (χ 2 ), followed by dimensionality reduction using PCA to retain 95% of the data variance. DSMEC outperformed traditional and state-of-the-art methods, achieving 96.80% accuracy and a 95.66% AUC score, a 24.92% improvement in accuracy. Strategic feature engineering and hierarchical ensemble learning enables DSMEC to capture complex clinical data interdependencies, making it effective for early CVD risk prediction.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom