Premium
Performance of using multiple stepwise algorithms for variable selection
Author(s) -
Wiegand Ryan E.
Publication year - 2010
Publication title -
statistics in medicine
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.996
H-Index - 183
eISSN - 1097-0258
pISSN - 0277-6715
DOI - 10.1002/sim.3943
Subject(s) - stepwise regression , multivariable calculus , logistic regression , statistics , selection (genetic algorithm) , sample size determination , feature selection , variable (mathematics) , proportional hazards model , computer science , algorithm , mathematics , artificial intelligence , engineering , mathematical analysis , control engineering
Some research studies in the medical literature use multiple stepwise variable selection (SVS) algorithms to build multivariable models. The purpose of this study is to determine whether the use of multiple SVS algorithms in tandem (stepwise agreement) is a valid variable selection procedure. Computer simulations were developed to address stepwise agreement. Three popular SVS algorithms were tested (backward elimination, forward selection, and stepwise) on three statistical methods (linear, logistic, and Cox proportional hazards regression). Other simulation parameters explored were the sample size, number of predictors considered, degree of correlation between pairs of predictors, p ‐value‐based entrance and exit criteria, predictor type (normally distributed or binary), and differences between stepwise agreement between any two or all three algorithms. Among stepwise methods, the rate of agreement, agreement on a model including only those predictors truly associated with the outcome, and agreement on a model containing the predictors truly associated with the outcome were measured. These rates were dependent on all simulation parameters. Mostly, the SVS algorithms agreed on a final model, but rarely on a model with only the true predictors. Sample size and candidate predictor pool size are the most influential simulation conditions. To conclude, stepwise agreement is often a poor strategy that gives misleading results and researchers should avoid using multiple SVS algorithms to build multivariable models. More research on the relationship between sample size and variable selection is needed. Published in 2010 by John Wiley & Sons, Ltd.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom