
Achieving Optimal K-Anonymity Parameters for Big Data
Author(s) -
Mohammed Al-Zobbi,
Seyed Shahrestani,
Chun Ruan
Publication year - 2018
Publication title -
international journal of information, communication technology and applications
Language(s) - English
Resource type - Journals
ISSN - 2205-0930
DOI - 10.17972/ijicta20184136
Subject(s) - data anonymization , computer science , anonymity , k anonymity , heuristic , information sensitivity , identification (biology) , compromise , private information retrieval , big data , data publishing , personally identifiable information , identifier , data mining , information privacy , computer security , publishing , artificial intelligence , social science , botany , sociology , political science , law , biology , programming language
Datasets containing private and sensitive information are useful for data analytics. Data owners cautiously release such sensitive data using privacy-preserving publishing techniques. Personal re-identification possibility is much larger than ever before. For instance, social media has dramatically increased the exposure to privacy violation. One well-known technique of k-anonymity proposes a protection approach against privacy exposure. K-anonymity tends to find k equivalent number of data records. The chosen attributes are known as Quasi-identifiers. This approach may reduce the personal re-identification. However, this may lessen the usefulness of information gained. The value of k should be carefully determined, to compromise both security and information gained. Unfortunately, there is no any standard procedure to define the value of k. The problem of the optimal k-anonymization is NP-hard. In this paper, we propose a greedy-based heuristic approach that provides an optimal value for k. The approach evaluates the empirical risk concerning our Sensitivity-Based Anonymization method. Our approach is derived from the fine-grained access and business role anonymization for big data, which forms our framework.