Hybrid Deep Machine Learning Feature Selection for High-Dimensional Cybersecurity Data
Author(s) -
Sesan Akintade,
Kaushik Roy,
Seongtae Kim
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3615582
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
The rapid increase in cyber threats has heightened the demand for Intrusion Detection Systems (IDS) that are both accurate and efficient. While deep learning models outperform traditional machine learning models in identifying complex attack patterns, their effectiveness is often constrained by high-dimensional feature spaces, reduced interpretability, and increased computational cost. To address these, we propose a novel IDS framework: Hybrid Deep Machine Learning Feature Selection (HDMLFS), which leverages Integrated Gradients (IG) and SHapley Additive exPlanations (SHAP) sensitivity to feature perturbations and global consistency in feature importance, enabling a more robust selection process with high performance. First, a correlation-based algorithm removes redundant features by analyzing the upper triangular part of the correlation matrix and discarding the less informative feature from each highly correlated pair. Next, a voting-based algorithm combines IG and SHAP rankings to identify the most informative features, ensuring that at least half of the features are retained while maximizing relevance. The framework was evaluated using the NSL-KDD and CSE-CIC IDS2018 datasets, reducing the feature space by 48% and 65%, respectively. Models trained with the selected features demonstrated superior performance, with ResNet-SF achieving the best results: 98.23% weighted accuracy on CSE-CIC IDS2018, including 86.96% recall for rare Web attacks, and 99.77% accuracy on NSL-KDD, including 80.77% recall for the rare U2R attack. These results highlight the effectiveness of HDMLFS in improving detection capability while reducing complexity and supporting efficient and interpretable IDS solutions.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom