Protecting Privacy Using k-Anonymity
Open Access
- 1 September 2008
- journal article
- research article
- Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association
- Vol. 15 (5), 627-637
- https://doi.org/10.1197/jamia.m2716
Abstract
Objective: There is increasing pressure to share health information and even make it publicly available. However, such disclosures of personal health information raise serious privacy concerns. To alleviate such concerns, it is possible to anonymize the data before disclosure. One popular anonymization approach is k-anonymity. There have been no evaluations of the actual re-identification probability of k-anonymized data sets. Design: Through a simulation, we evaluated the re-identification risk of k-anonymization and three different improvements on three large data sets. Measurement: Re-identification probability is measured under two different re-identification scenarios. Information loss is measured by the commonly used discernability metric. Results: For one of the re-identification scenarios, k-Anonymity consistently over-anonymizes data sets, with this over-anonymization being most pronounced with small sampling fractions. Over-anonymization results in excessive distortions to the data (i.e., high information loss), making the data less useful for subsequent analysis. We found that a hypothesis testing approach provided the best control over re-identification risk and reduces the extent of information loss compared to baseline k-anonymity. Conclusion: Guidelines are provided on when to use the hypothesis testing approach instead of baseline k-anonymity.Keywords
This publication has 44 references indexed in Scilit:
- Thoughts on k-anonymizationData & Knowledge Engineering, 2007
- Evaluating Common De-Identification Heuristics for Personal Health InformationJournal of Medical Internet Research, 2006
- Privacy concerns in preventing fraudulent publicationCMAJ : Canadian Medical Association Journal, 2006
- Are journals doing enough to prevent fraudulent publication?CMAJ : Canadian Medical Association Journal, 2006
- Confidentiality and Confidence: Is Data Aggregation a Means to Achieve Both?Journal of Public Health Policy, 2005
- Ethical issues in sharing epidemiologic dataJournal of Clinical Epidemiology, 1991
- Disclosure Control of MicrodataJournal of the American Statistical Association, 1990
- Compelled disclosure of research data: An early warning and suggestions for psychologists.Law and Human Behavior, 1988
- Justifications for the sharing of social science data.Law and Human Behavior, 1988
- Obtaining Access to Data from Government-Sponsored Medical ResearchThe New England Journal of Medicine, 1986