Premium
Nonparametric screening and feature selection for ultrahigh‐dimensional Case II interval‐censored failure time data
Author(s) -
Hu Qiang,
Zhu Liang,
Liu Yanyan,
Sun Jianguo,
Srivastava Deo Kumar,
Robison Leslie L.
Publication year - 2020
Publication title -
biometrical journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.108
H-Index - 63
eISSN - 1521-4036
pISSN - 0323-3847
DOI - 10.1002/bimj.201900154
Subject(s) - nonparametric statistics , covariate , feature selection , computer science , accelerated failure time model , curse of dimensionality , feature (linguistics) , interval (graph theory) , selection (genetic algorithm) , data mining , statistics , mathematics , machine learning , linguistics , philosophy , combinatorics
For the analysis of ultrahigh‐dimensional data, the first step is often to perform screening and feature selection to effectively reduce the dimensionality while retaining all the active or relevant variables with high probability. For this, many methods have been developed under various frameworks but most of them only apply to complete data. In this paper, we consider an incomplete data situation, case II interval‐censored failure time data, for which there seems to be no screening procedure. Basing on the idea of cumulative residual, a model‐free or nonparametric method is developed and shown to have the sure independent screening property. In particular, the approach is shown to tend to rank the active variables above the inactive ones in terms of their association with the failure time of interest. A simulation study is conducted to demonstrate the usefulness of the proposed method and, in particular, indicates that it works well with general survival models and is capable of capturing the nonlinear covariates with interactions. Also the approach is applied to a childhood cancer survivor study that motivated this investigation.