z-logo
Premium
Evaluating uses of data mining techniques in propensity score estimation: a simulation study
Author(s) -
Setoguchi Soko,
Schneeweiss Sebastian,
Brookhart M. Alan,
Glynn Robert J.,
Cook E. Francis
Publication year - 2008
Publication title -
pharmacoepidemiology and drug safety
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.023
H-Index - 96
eISSN - 1099-1557
pISSN - 1053-8569
DOI - 10.1002/pds.1555
Subject(s) - covariate , propensity score matching , statistics , logistic regression , pruning , medicine , regression , correlation , econometrics , mathematics , agronomy , biology , geometry
Abstract Background In propensity score modeling, it is a standard practice to optimize the prediction of exposure status based on the covariate information. In a simulation study, we examined in what situations analyses based on various types of exposure propensity score (EPS) models using data mining techniques such as recursive partitioning (RP) and neural networks (NN) produce unbiased and/or efficient results. Method We simulated data for a hypothetical cohort study ( n  = 2000) with a binary exposure/outcome and 10 binary/continuous covariates with seven scenarios differing by non‐linear and/or non‐additive associations between exposure and covariates. EPS models used logistic regression (LR) (all possible main effects), RP1 (without pruning), RP2 (with pruning), and NN. We calculated c‐statistics ( C ), standard errors (SE), and bias of exposure‐effect estimates from outcome models for the PS‐matched dataset. Results Data mining techniques yielded higher C than LR (mean: NN, 0.86; RPI, 0.79; RP2, 0.72; and LR, 0.76). SE tended to be greater in models with higher C . Overall bias was small for each strategy, although NN estimates tended to be the least biased. C was not correlated with the magnitude of bias (correlation coefficient [COR] = −0.3, p  = 0.1) but increased SE (COR = 0.7, p  < 0.001). Conclusions Effect estimates from EPS models by simple LR were generally robust. NN models generally provided the least numerically biased estimates. C was not associated with the magnitude of bias but was with the increased SE. Copyright © 2008 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here