z-logo
Premium
Data masking for disclosure limitation
Author(s) -
Duncan George,
Stokes Lynne
Publication year - 2009
Publication title -
wiley interdisciplinary reviews: computational statistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.693
H-Index - 38
eISSN - 1939-0068
pISSN - 1939-5108
DOI - 10.1002/wics.3
Subject(s) - microdata (statistics) , computer science , confidentiality , outlier , synthetic data , masking (illustration) , data mining , identifier , data science , computer security , artificial intelligence , art , population , demography , sociology , visual arts , census , programming language
Protecting confidentiality is essential to the functioning of systems for collecting and disseminating data on individuals and enterprises that are necessary for evidence‐based public policy formulation. Deidentification of records, defined as removing obvious identifiers such as name and address, is not sufficient to protect confidentiality. Microdata have characteristics that lead to increased disclosure risk, such as existence of identification files, geographical detail, outliers, many/detailed attribute variables, or longitudinal or panel structure in the data. Data stewardship organizations can lower disclosure risk through disclosure limitation methods and through the construction of synthetic data. Both record and attribute suppression can be represented by matrix masks, as can perturbation through noise addition, and data swapping. Also sampling and aggregation have matrix mask representations. Distinct from masking methods, synthetic data construction considers the microdata to be a realization of some statistical model. It then replaces the true microdata with samples generated according to the model. The released data consist of records of individual synthetic units rather than records for the actual units. The organization must recognize uncertainty in both model form and values of model parameters. This argues for the relevance of hierarchical and mixture models to generate the synthetic data. Synthetic data has an advantage as a disclosure limitation method over masked data because it is easier for the user to analyze. Copyright © 2009 Wiley Periodicals, Inc., A Wiley Company This article is categorized under: Data: Types and Structure > Data Preparation and Processing

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here