Premium
Multiple imputation for systematically missing confounders within a distributed data drug safety network: A simulation study and real‐world example
Author(s) -
Secrest Matthew H.,
Platt Robert W.,
Reynier Pauline,
Dormuth Colin R.,
Benedetti Andrea,
Filion Kristian B.
Publication year - 2020
Publication title -
pharmacoepidemiology and drug safety
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.023
H-Index - 96
eISSN - 1099-1557
pISSN - 1053-8569
DOI - 10.1002/pds.4876
Subject(s) - missing data , imputation (statistics) , confounding , medicine , hazard ratio , statistics , confidence interval , univariate , multivariate statistics , data mining , computer science , mathematics
Purpose In distributed data networks, some data sites may be systematically missing important confounders that are captured by other sites in the network (eg, body mass index [BMI]). Multiple imputation may help repair bias in these scenarios. However, multiple imputation has not been described for distributed data networks where data access restrictions prevent centralized analysis. Methods We conducted a simulation study and a real‐world analysis using the UK's Clinical Practice Research Datalink to evaluate multiple imputation for confounders that are systematically missing from a subset of data sites in mock distributed data networks. The simulation study addressed univariate missing data, while the real‐world analysis addressed multivariate missing data. Both studies were designed as retrospective cohort studies of the effect of current statin use on the risk of myocardial infarction among patients with newly treated type 2 diabetes. Results In our simulation study, multiple imputation repaired bias from missing BMI in all scenarios, with a median bias reduction of 118% in the default scenario. In our real‐world study, the multiply imputed analysis (hazard ratio [HR]: 0.86; 95% confidence interval [CI], 0.69‐1.08) was closer to the analysis that considered the true confounder values (HR: 0.85; 95% CI, 0.66‐1.10) than the analysis that ignored them (HR: 0.93; 95% CI, 0.73‐1.20). Conclusions Multiple imputation adapted to distributed data settings is a feasible method to reduce bias from unmeasured but measurable confounders when at least one database contains the variables of interest. Further research is needed to evaluate its validity in real distributed data networks.