z-logo
open-access-imgOpen Access
Maintaining Analytic Utility while Protecting Confidentiality of Survey and Nonsurvey Data
Author(s) -
Avinash C. Singh
Publication year - 2010
Publication title -
journal of privacy and confidentiality
Language(s) - English
Resource type - Journals
ISSN - 2575-8527
DOI - 10.29012/jpc.v1i2.571
Subject(s) - confidentiality , computer science , macro , actuarial science , econometrics , statistics , data mining , mathematics , business , computer security , programming language
Consider a complete rectangular database at the micro (or unit) level from a survey (sample or census) or nonsurvey (administrative source) in which potential identifying variables (IVs) are suitably categorized (so that the analytic utility is essentially maintained) for reducing the pretreatment disclosure risk to the extent possible. The pretreatment risk is due to the presence of unique records (with respect to IVs) or nonuniques (i.e., more than one record having a common IV prole) with similar values of at least one sensitive variable (SV). This setup covers macro (or aggregate) level data including tabular data because a common mean value (of 1 in the case of count data) to all units in the aggregation or cell can be assigned. Our goal is to create a public use le with simultaneous control of disclosure risk and information loss after disclosure treatment by perturbation (i.e., substitution of IVs and not SVs) and suppression (i.e., subsampling-out of records). In this paper, an alternative framework of measuring information loss and disclosure risk under a nonsynthetic approach as proposed by Singh (2002, 2006) is considered which, in contrast to the commonly used deterministic treat- ment, is based on a stochastic selection of records for disclosure treatment in the sense that all records are subject to treatment (with possibly dierent probabil- ities), but only a small proportion of them are actually treated. We also pro- pose an extension of the above alternative framework of Singh with the goal of generalizing risk measures to allow partial risk scores for unique and nonunique records. Survey sampling techniques of sample allocation are used to assign sub- stitution and subsampling rates to risk strata dened by unique and nonunique records such that bias due to substitution and variance due to subsampling for main study variables (functions of SVs and IVs) are minimized. This is followed by calibration to controls based on original estimates of main study variables so that these estimates are preserved, and bias and variance for other study vari- ables may also be reduced. The above alternative framework leads to the method of disclosure treatment known as MASSC (signifying micro-agglomeration, sub- stitution, subsampling, and calibration) and to an enhanced method (denoted GenMASSC) which uses generalized risk measures. The GenMASSC method is illustrated through a simple example followed by a discussion of relative merits and demerits of nonsynthetic and synthetic methods of disclosure treatment.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom