z-logo
open-access-imgOpen Access
Model Selection when multiple imputation is used to protect confidentiality in public use data
Author(s) -
Satkartar K. Kinney,
Jerome P. Reiter,
James O. Berger
Publication year - 2011
Publication title -
journal of privacy and confidentiality
Language(s) - English
Resource type - Journals
ISSN - 2575-8527
DOI - 10.29012/jpc.v2i2.588
Subject(s) - imputation (statistics) , computer science , identifier , confidentiality , bayes' theorem , data mining , model selection , selection (genetic algorithm) , bayesian probability , missing data , machine learning , artificial intelligence , computer security , programming language
Sev eral statistical agencies use, or are considering the use of, multi- ple imputation to limit the risk of disclosing respondents' identities or sensitive attributes in public use data les. For example, agencies can release partially syn- thetic datasets, comprising the units originally surveyed with some values, such as sensitive values at high risk of disclosure or values of key identiers, replaced with multiple imputations. We describe how secondary analysts of such multiply- imputed datasets can implement Bayesian model selection procedures that appro- priately condition on the multiple datasets and the information released by the agency about the imputation models. We illustrate by deriving Bayes factor ap- proximations and a data augmentation step for stochastic search variable selection algorithms.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom