Premium
Automatic variable selection for exposure‐driven propensity score matching with unmeasured confounders
Author(s) -
Zöller Daniela,
Wockner Leesa F.,
Binder Harald
Publication year - 2020
Publication title -
biometrical journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.108
H-Index - 63
eISSN - 1521-4036
pISSN - 0323-3847
DOI - 10.1002/bimj.201800190
Subject(s) - propensity score matching , covariate , outcome (game theory) , confounding , statistics , matching (statistics) , selection bias , econometrics , selection (genetic algorithm) , mathematics , computer science , machine learning , mathematical economics
Abstract Multivariable model building for propensity score modeling approaches is challenging. A common propensity score approach is exposure‐driven propensity score matching, where the best model selection strategy is still unclear. In particular, the situation may require variable selection, while it is still unclear if variables included in the propensity score should be associated with the exposure and the outcome, with either the exposure or the outcome, with at least the exposure or with at least the outcome. Unmeasured confounders, complex correlation structures, and non‐normal covariate distributions further complicate matters. We consider the performance of different modeling strategies in a simulation design with a complex but realistic structure and effects on a binary outcome. We compare the strategies in terms of bias and variance in estimated marginal exposure effects. Considering the bias in estimated marginal exposure effects, the most reliable results for estimating the propensity score are obtained by selecting variables related to the exposure. On average this results in the least bias and does not greatly increase variances. Although our results cannot be generalized, this provides a counterexample to existing recommendations in the literature based on simple simulation settings. This highlights that recommendations obtained in simple simulation settings cannot always be generalized to more complex, but realistic settings and that more complex simulation studies are needed.