Premium
Cross‐validation and peeling strategies for survival bump hunting using recursive peeling methods
Author(s) -
Dazard JeanEudes,
Choe Michael,
LeBlanc Michael,
Rao J. Sunil
Publication year - 2016
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.11301
Subject(s) - covariate , estimator , recursive partitioning , computer science , concordance , proportional hazards model , survival analysis , statistics , parametric statistics , cross validation , data mining , mathematics , machine learning , medicine
We introduce a framework to build a survival/risk bump hunting model with a censored time‐to‐event response. Our survival bump hunting (SBH) method is based on a recursive peeling procedure that uses a specific survival peeling criterion derived from non‐/semi‐parametric statistics such as the hazard ratio, the log‐rank test or the Nelson–Aalen estimator. To optimize the tuning parameter of the model and validate it, we introduce an objective function based on survival‐ or prediction‐error statistics, such as the log‐rank test and the concordance error rate. We also describe two alternative cross‐validation techniques adapted for the joint task of decision‐rule making by recursive peeling and survival estimation. Numerical analyses show the importance of replicated cross‐validation and the differences between criteria and techniques in both low‐ and high‐dimensional settings. Although several non‐parametric survival models exist, none address the problem of directly identifying local extrema. We show how SBH efficiently estimates extreme survival/risk subgroups, unlike other models. This provides an insight into the behavior of commonly used models and suggests alternatives to be adopted in practice. Finally, our SBH framework was applied to a clinical dataset. In it, we identified subsets of patients characterized by clinical and demographic covariates with a distinct extreme survival outcome for which tailored medical interventions could be made. An R package Patient Rule Induction Method in Survival, Regression and Classification settings ( PRIMsrc ) is available on Comprehensive R Archive Network (CRAN) and GitHub.