Effects of Data Anonymization by Cell Suppression on Descriptive Statistics and Predictive Modeling Performance | Zendy

Lucila OhnoMachado | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Effects of Data Anonymization by Cell Suppression on Descriptive Statistics and Predictive Modeling Performance

Author(s) -

Lucila OhnoMachado

Publication year - 2002

Publication title -

journal of the american medical informatics association

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.614

H-Index - 150

eISSN - 1527-974X

pISSN - 1067-5027

DOI - 10.1197/jamia.m1241

Subject(s) - computer science , data anonymization , data mining , table (database) , feature (linguistics) , anonymity , confidentiality , statistics , information privacy , mathematics , computer security , linguistics , philosophy

Protecting individual data in disclosed databases is essential. Data anonymization strategies can produce table ambiguation by suppression of selected cells. Using table ambiguation, different degrees of anonymization can be achieved, depending on the number of individuals that a particular case must become indistinguishable from. This number defines the level of anonymization. Anonymization by cell suppression does not necessarily prevent inferences from being made from the disclosed data. Preventing inferences may be important to preserve confidentiality. We show that anonymized data sets can preserve descriptive characteristics of the data, but might also be used for making inferences on particular individuals, which is a feature that may not be desirable. The degradation of predictive performance is directly proportional to the degree of anonymity. As an example, we report the effect of anonymization on the predictive performance of a model constructed to estimate the probability of disease given clinical findings

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research