Premium
Empirical extensions of the lasso penalty to reduce the false discovery rate in high‐dimensional Cox regression models
Author(s) -
Ternès Nils,
Rotolo Federico,
Michiels Stefan
Publication year - 2016
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.6927
Subject(s) - akaike information criterion , bayesian information criterion , false discovery rate , lasso (programming language) , model selection , statistics , feature selection , mathematics , regression , dimensionality reduction , computer science , artificial intelligence , biology , world wide web , gene , biochemistry
Correct selection of prognostic biomarkers among multiple candidates is becoming increasingly challenging as the dimensionality of biological data becomes higher. Therefore, minimizing the false discovery rate (FDR) is of primary importance, while a low false negative rate (FNR) is a complementary measure. The lasso is a popular selection method in Cox regression, but its results depend heavily on the penalty parameter λ. Usually, λ is chosen using maximum cross‐validated log‐likelihood (max‐ cvl ). However, this method has often a very high FDR. We review methods for a more conservative choice of λ. We propose an empirical extension of the cvl by adding a penalization term, which trades off between the goodness‐of‐fit and the parsimony of the model, leading to the selection of fewer biomarkers and, as we show, to the reduction of the FDR without large increase in FNR. We conducted a simulation study considering null and moderately sparse alternative scenarios and compared our approach with the standard lasso and 10 other competitors: Akaike information criterion (AIC), corrected AIC, Bayesian information criterion (BIC), extended BIC, Hannan and Quinn information criterion (HQIC), risk information criterion (RIC), one‐standard‐error rule, adaptive lasso, stability selection, and percentile lasso. Our extension achieved the best compromise across all the scenarios between a reduction of the FDR and a limited raise of the FNR, followed by the AIC, the RIC, and the adaptive lasso, which performed well in some settings. We illustrate the methods using gene expression data of 523 breast cancer patients. In conclusion, we propose to apply our extension to the lasso whenever a stringent FDR with a limited FNR is targeted. Copyright © 2016 John Wiley & Sons, Ltd.