Review and evaluation of penalised regression methods for risk prediction in low‐dimensional data with few events | Zendy

Pavlou Menelaos | Zendy; Ambler Gareth | Zendy; Seaman Shaun | Zendy; De Iorio Maria | Zendy; Omar Rumana Z | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Review and evaluation of penalised regression methods for risk prediction in low‐dimensional data with few events

Author(s) -

Pavlou Menelaos,

Ambler Gareth,

Seaman Shaun,

De Iorio Maria,

Omar Rumana Z

Publication year - 2015

Publication title -

statistics in medicine

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.996

H-Index - 183

eISSN - 1097-0258

pISSN - 0277-6715

DOI - 10.1002/sim.6782

Subject(s) - elastic net regularization , overfitting , lasso (programming language) , computer science , logistic regression , statistics , prior probability , hyperparameter , feature selection , regression , bayesian probability , predictive modelling , model selection , regression analysis , artificial intelligence , machine learning , mathematics , artificial neural network , world wide web

Risk prediction models are used to predict a clinical outcome for patients using a set of predictors. We focus on predicting low‐dimensional binary outcomes typically arising in epidemiology, health services and public health research where logistic regression is commonly used. When the number of events is small compared with the number of regression coefficients, model overfitting can be a serious problem. An overfitted model tends to demonstrate poor predictive accuracy when applied to new data. We review frequentist and Bayesian shrinkage methods that may alleviate overfitting by shrinking the regression coefficients towards zero (some methods can also provide more parsimonious models by omitting some predictors). We evaluated their predictive performance in comparison with maximum likelihood estimation using real and simulated data. The simulation study showed that maximum likelihood estimation tends to produce overfitted models with poor predictive performance in scenarios with few events, and penalised methods can offer improvement. Ridge regression performed well, except in scenarios with many noise predictors. Lasso performed better than ridge in scenarios with many noise predictors and worse in the presence of correlated predictors. Elastic net, a hybrid of the two, performed well in all scenarios. Adaptive lasso and smoothly clipped absolute deviation performed best in scenarios with many noise predictors; in other scenarios, their performance was inferior to that of ridge and lasso. Bayesian approaches performed well when the hyperparameters for the priors were chosen carefully. Their use may aid variable selection, and they can be easily extended to clustered‐data settings and to incorporate external information. © 2015 The Authors. Statistics in Medicine Published by JohnWiley & Sons Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research