
Information theoretic-based privacy risk evaluation for data anonymization
Author(s) -
Anis Bkakria,
Frédéric Cuppens,
Nora Cuppens,
Αιμιλία Τασίδου
Publication year - 2022
Publication title -
journal of surveillance, security and safety
Language(s) - English
Resource type - Journals
ISSN - 2694-1015
DOI - 10.20517/jsss.2020.20
Subject(s) - metric (unit) , computer science , inference , data anonymization , data publishing , identification (biology) , data mining , process (computing) , information sensitivity , information privacy , publishing , artificial intelligence , computer security , operations management , botany , political science , law , biology , operating system , economics
Aim: Data anonymization aims to enable data publishing without compromising the individuals’ privacy. The reidentification and sensitive information inference risks of a dataset are important factors in the decision-making process for the techniques and the parameters of the anonymization process. If correctly assessed, measuring the reidentification and inference risks can help optimize the balance between protection and utility of the dataset, as too aggressive anonymization can render the data useless, while publishing data with a high risk of de-anonymization is troublesome. Methods: In this paper, a new information theoretic-based privacy metric (ITPR) for assessing both the re-identification risk and sensitive information inference risk of datasets is proposed. We compare the proposed metric with existing information theoretic metrics and their ability to assess risk for various cases of dataset characteristics. Results: We show that ITPR is the only metric that can effectively quantify both re-identification and sensitive information inference risks. We provide several experiments to illustrate the effectiveness of ITPR. Conclusion: Unlike existing information theoretic-based privacy metrics, the ITPR metric we propose in this paper is, to the best of our knowledge, the first information theoretic-based privacy metric that allows correctly assessing both re-identification and sensitive information inference risks.