z-logo
open-access-imgOpen Access
Validity of Privacy-Protecting Analytical Methods That Use Only Aggregate-Level Information to Conduct Multivariable-Adjusted Analysis in Distributed Data Networks
Author(s) -
Xiaojuan Li,
Bruce Fireman,
Jeffrey R. Curtis,
David Arterburn,
David Fisher,
Érick Moyneur,
M. J. Gallagher,
Marsha A. Raebel,
W. Benjamin Nowell,
Lindsay Lagreid,
Sengwee Toh
Publication year - 2018
Publication title -
american journal of epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.33
H-Index - 256
eISSN - 1476-6256
pISSN - 0002-9262
DOI - 10.1093/aje/kwy265
Subject(s) - weighting , computer science , covariate , propensity score matching , matching (statistics) , confounding , statistics , data mining , aggregate (composite) , aggregate data , comparability , data set , inverse probability weighting , medicine , mathematics , machine learning , artificial intelligence , materials science , combinatorics , composite material , radiology
Distributed data networks enable large-scale epidemiologic studies, but protecting privacy while adequately adjusting for a large number of covariates continues to pose methodological challenges. Using 2 empirical examples within a 3-site distributed data network, we tested combinations of 3 aggregate-level data-sharing approaches (risk-set, summary-table, and effect-estimate), 4 confounding adjustment methods (matching, stratification, inverse probability weighting, and matching weighting), and 2 summary scores (propensity score and disease risk score) for binary and time-to-event outcomes. We assessed the performance of combinations of these data-sharing and adjustment methods by comparing their results with results from the corresponding pooled individual-level data analysis (reference analysis). For both types of outcomes, the method combinations examined yielded results identical or comparable to the reference results in most scenarios. Within each data-sharing approach, comparability between aggregate- and individual-level data analysis depended on adjustment method; for example, risk-set data-sharing with matched or stratified analysis of summary scores produced identical results, while weighted analysis showed some discrepancies. Across the adjustment methods examined, risk-set data-sharing generally performed better, while summary-table and effect-estimate data-sharing more often produced discrepancies in settings with rare outcomes and small sample sizes. Valid multivariable-adjusted analysis can be performed in distributed data networks without sharing of individual-level data.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here