Premium
Statistical methods for some simple disclosure limitation rules[Note 1. jpnk@cbs.nl The views expressed in this paper are those ...]
Author(s) -
Pannekoek J.
Publication year - 1999
Publication title -
statistica neerlandica
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.52
H-Index - 39
eISSN - 1467-9574
pISSN - 0039-0402
DOI - 10.1111/1467-9574.00097
Subject(s) - confidentiality , guard (computer science) , coding (social sciences) , computer science , set (abstract data type) , population , sample (material) , data mining , simple (philosophy) , process (computing) , statistical analysis , data science , statistics , computer security , mathematics , medicine , philosophy , chemistry , environmental health , epistemology , chromatography , programming language , operating system
To guard the confidentiality of information provided by respondents, statistical offices apply disclosure limitation techniques. An often applied technique is to ensure that there are no categories for which the population frequency is presumed to be small (‘rare’ categories). This is attained by recoding, top‐coding or setting values to ‘unknown’. Since population frequencies are usually not available, the decision that a category is rare is often based on intuitive considerations. This is a time consuming process, involving many decisions of the disclosure limitation practitioners. In this paper it will be explored to what extent the sample frequencies can be used to make such decisions. This leads to a procedure which enables to automatically scan a data set for rare category combinations, whereby ‘rare’ is defined by the disclosure limitation policy of the statistical office.