z-logo
open-access-imgOpen Access
How Can We Analyze Differentially-Private Synthetic Datasets?
Author(s) -
Anne-Sophie Charest
Publication year - 2011
Publication title -
journal of privacy and confidentiality
Language(s) - English
Resource type - Journals
ISSN - 2575-8527
DOI - 10.29012/jpc.v2i2.589
Subject(s) - synthetic data , computer science , differential privacy , data mining , bayesian probability , imputation (statistics) , machine learning , artificial intelligence , missing data
Synthetic datasets generated within the multiple imputation framework are now commonly used by statistical agencies to protect the confidentiality of their respondents. More recently, researchers have also proposed techniques to generate synthetic datasets which offer the formal guarantee of differential privacy. While combining rules were derived for the first type of synthetic datasets, little has been said on the analysis of differentially-private synthetic datasets generated with multiple imputations. In this paper, we show that we can not use the usual combining rules to analyze synthetic datasets which have been generated to achieve differential privacy. We consider specifically the case of generating synthetic count data with the beta-binomial synthetizer, and illustrate our discussion with simulation results. We also propose as a simple alternative a Bayesian model which models explicitly the mechanism for synthetic data generation.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom