Optimal Multivariate 2-Microaggregation for Microdata Protection: A 2-Approximation
Author(s) -
Josep DomingoFerrer,
Francesc Sebé
Publication year - 2006
Publication title -
lecture notes in computer science
Language(s) - English
Resource type - Book series
SCImago Journal Rank - 0.249
H-Index - 400
eISSN - 1611-3349
pISSN - 0302-9743
ISBN - 3-540-49330-1
DOI - 10.1007/11930242_12
Subject(s) - heuristics , approximation algorithm , computer science , microdata (statistics) , k anonymity , cluster analysis , time complexity , heuristic , combinatorics , mathematics , algorithm , information privacy , artificial intelligence , population , demography , internet privacy , sociology , census , operating system
Microaggregation is a special clustering problem where the goal is to cluster a set of points into groups of at least k points in such a way that groups are as homogeneous as possible. Microaggregation arises in connection with anonymization of statistical databases for privacy protection (k-anonymity), where points are assimilated to database records. A usual group homogeneity criterion is within-groups sum of squares minimization SSE. For multivariate points, optimal microaggregation, i.e. with minimum SSE, has been shown to be NP-hard. Recently, a polynomial-time O(k3)-approximation heuristic has been proposed (previous heuristics in the literature offered no approximation bounds). The special case k=2 (2-microaggregation) is interesting in privacy protection scenarios with neither internal intruders nor outliers, because information loss is lower: smaller groups imply smaller information loss. For 2-microaggregation the existing general approximation can only guarantee a 54-approximation. We give here a new polynomial-time heuristic whose SSE is at most twice the minimum SSE (2-approximation).
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom