Using k-anonymization for registry data: pitfalls and alternatives
Author(s) -
Sten Anspal,
Mart Kaska,
Indrek Seppo
Publication year - 2017
Publication title -
acta et commentationes universitatis tartuensis de mathematica
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.276
H-Index - 6
eISSN - 2228-4699
pISSN - 1406-2283
DOI - 10.12697/acutm.2017.21.05
Subject(s) - microdata (statistics) , statistician , confidentiality , computer science , lossless compression , information privacy , data science , data anonymization , data mining , econometrics , internet privacy , statistics , computer security , census , medicine , mathematics , artificial intelligence , data compression , population , environmental health
We describe an applied study of ICT students' employment in Estonia based on data from two national registries. The study offered an opportunity to compare results from both k-anonymised data as well as those from the novel SHAREMIND platform for privacy-preserving statistical computing, which offers a way to use confidential data for research without loss of information. Comparison of results using k-anonymized and lossless data indicate substantial differences in estimates of students' employment rates. The results illustrate, on the basis of a real-world study, how the effects of k-anonymization can lead to considerable bias in estimates. While privacy-preserving computing does entail inconveniences because original microdata is not revealed to the statistician, this can be offset by greater confidence in the results.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom