Out-of-Fold Stacking Regression for Rapid Prediction of Time-to-Exploit in Newly Disclosed Vulnerabilities
Author(s) -
Jin-Ki Hong,
Sang-Joon Lee
Publication year - 2025
Publication title -
ieee access
Language(s) - English
Resource type - Magazines
SCImago Journal Rank - 0.587
H-Index - 127
eISSN - 2169-3536
DOI - 10.1109/access.2025.3621226
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
The Time-To-Exploit (TTE), defined as the interval between vulnerability disclosure and exploitation, has recently contracted to as little as 3–5 days, making it a critical tactical indicator in modern cyber operations. This study proposes a stacking ensemble regression model for rapid and automated TTE prediction using only limited information available at the vulnerability disclosure stage. The framework integrates heterogeneous features—Common Vulnerability Scoring System (CVSS) base metrics, unstructured CVE descriptions, and Proof-of-Concept (PoC) event counts—through tailored preprocessing pipelines and optimized base models, which are consolidated via an out-of-fold (OOF) meta model. Multi-stage experiments identified feature-specific optimal configurations: CVSS metrics with target encoding and Ridge regression, CVE descriptions with SecBERT embeddings, mean–max pooling, Principal Component Analysis (PCA), and CatBoost regression, and PoC event counts with logarithmic transformation and linear regression. The final stacked model outperformed all base models, achieving higher accuracy in Mean Absolute Error (MAE) and Coefficient of Determination (R²). Explainability analysis using SHapley Additive exPlanations (SHAP) revealed feature contributions of 0.39 for CVSS, 0.30 for descriptions, and 0.03 for PoC events. Residual analysis confirmed prediction stability with a near-symmetric bell-shaped distribution centered at zero. Compared with a hold-out blending ensemble under identical conditions, the stacking framework delivered superior performance. Unlike existing studies centered on static severity scores or categorical classification, this work reframes vulnerability assessment as a dynamic, time-based regression problem, demonstrating the feasibility of quantitatively estimating exploitation timelines and enabling faster vulnerability response.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom