Premium
Simplifying a prognostic model: a simulation study based on clinical data
Author(s) -
Ambler Gareth,
Brady Anthony R.,
Royston Patrick
Publication year - 2002
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.1422
Subject(s) - akaike information criterion , computer science , model selection , lasso (programming language) , outcome (game theory) , information criteria , feature selection , statistics , data mining , machine learning , econometrics , mathematics , mathematical economics , world wide web
Prognostic models are designed to predict a clinical outcome in individuals or groups of individuals with a particular disease or condition. To avoid bias many researchers advocate the use of full models developed by prespecifying predictors. Variable selection is not employed and the resulting models may be large and complicated. In practice more parsimonious models that retain most of the prognostic information may be preferred. We investigate the effect on various performance measures, including mean square error and prognostic classification, of three methods for estimating full models (including penalized estimation and Tibshirani's lasso) and consider two methods (backwards elimination and a new proposal called stepdown) for simplifying full models. Simulation studies based on two medical data sets suggest that simplified models can be found that perform nearly as well as, or sometimes even better than, full models. Optimizing the Akaike information criterion appears to be appropriate for choosing the degree of simplification. Copyright © 2002 John Wiley & Sons, Ltd.