Bayesian Estimation of Disclosure Risks for Multiply Imputed, Synthetic Data
Author(s) -
Jerome P. Reiter,
Quanli Wang,
Biyuan Zhang
Publication year - 2014
Publication title -
journal of privacy and confidentiality
Language(s) - English
Resource type - Journals
ISSN - 2575-8527
DOI - 10.29012/jpc.v6i1.635
Subject(s) - microdata (statistics) , confidentiality , synthetic data , dissemination , bayesian probability , computer science , survey data collection , statistics , econometrics , data mining , mathematics , census , artificial intelligence , computer security , medicine , environmental health , telecommunications , population
Agencies seeking to disseminate public use microdata, i.e., data on individual records, can replace confidential values with multiple draws from statistical models estimated with the collected data. We present a famework for evaluating disclosure risks inherent in releasing multiply-imputed, synthetic data. The basic idea is to mimic an intruder who computes posterior distributions of confidential values given the released synthetic data and prior knowledge. We illustrate the methodology with artificial fully synthetic data and with partial synthesis of the Survey of Youth in Custody.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom