Using k-anonymization for registry data: pitfalls and alternatives | Zendy

Sten Anspal | Zendy; Mart Kaska | Zendy; Indrek Seppo | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Using k-anonymization for registry data: pitfalls and alternatives

Author(s) -

Sten Anspal,

Mart Kaska,

Indrek Seppo

Publication year - 2017

Publication title -

acta et commentationes universitatis tartuensis de mathematica

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.276

H-Index - 6

eISSN - 2228-4699

pISSN - 1406-2283

DOI - 10.12697/acutm.2017.21.05

Subject(s) - microdata (statistics) , statistician , confidentiality , computer science , lossless compression , information privacy , data science , data anonymization , data mining , econometrics , internet privacy , statistics , computer security , census , medicine , mathematics , artificial intelligence , data compression , population , environmental health

We describe an applied study of ICT students' employment in Estonia based on data from two national registries. The study offered an opportunity to compare results from both k-anonymised data as well as those from the novel SHAREMIND platform for privacy-preserving statistical computing, which offers a way to use confidential data for research without loss of information. Comparison of results using k-anonymized and lossless data indicate substantial differences in estimates of students' employment rates. The results illustrate, on the basis of a real-world study, how the effects of k-anonymization can lead to considerable bias in estimates. While privacy-preserving computing does entail inconveniences because original microdata is not revealed to the statistician, this can be offset by greater confidence in the results.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research