z-logo
open-access-imgOpen Access
Validity of Privacy-Protecting Analytical Methods That Use Only Aggregate-Level Information to Conduct Multivariable-Adjusted Analysis in Distributed Data Networks
Author(s) -
Xiaojuan Li,
Bruce Fireman,
Jeffrey R. Curtis,
David Arterburn,
David Fisher,
Érick Moyneur,
Mia Gallagher,
Marsha A. Raebel,
W. Benjamin Nowell,
Lindsay Lagreid,
Sengwee Toh
Publication year - 2018
Publication title -
american journal of epidemiology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.33
H-Index - 256
eISSN - 1476-6256
pISSN - 0002-9262
DOI - 10.1093/aje/kwy265
Subject(s) - weighting , computer science , covariate , propensity score matching , matching (statistics) , confounding , statistics , data mining , aggregate (composite) , aggregate data , comparability , data set , inverse probability weighting , medicine , mathematics , machine learning , artificial intelligence , materials science , combinatorics , composite material , radiology
Distributed data networks enable large-scale epidemiologic studies, but protecting privacy while adequately adjusting for a large number of covariates continues to pose methodological challenges. Using 2 empirical examples within a 3-site distributed data network, we tested combinations of 3 aggregate-level data-sharing approaches (risk-set, summary-table, and effect-estimate), 4 confounding adjustment methods (matching, stratification, inverse probability weighting, and matching weighting), and 2 summary scores (propensity score and disease risk score) for binary and time-to-event outcomes. We assessed the performance of combinations of these data-sharing and adjustment methods by comparing their results with results from the corresponding pooled individual-level data analysis (reference analysis). For both types of outcomes, the method combinations examined yielded results identical or comparable to the reference results in most scenarios. Within each data-sharing approach, comparability between aggregate- and individual-level data analysis depended on adjustment method; for example, risk-set data-sharing with matched or stratified analysis of summary scores produced identical results, while weighted analysis showed some discrepancies. Across the adjustment methods examined, risk-set data-sharing generally performed better, while summary-table and effect-estimate data-sharing more often produced discrepancies in settings with rare outcomes and small sample sizes. Valid multivariable-adjusted analysis can be performed in distributed data networks without sharing of individual-level data.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom