A Globally Optimal k-Anonymity Method for the De-Identification of Health Data
Open Access
- 1 September 2009
- journal article
- research article
- Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association
- Vol. 16 (5), 670-682
- https://doi.org/10.1197/jamia.m3144
Abstract
Background: Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified. Objective: The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets. Design: Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits. Measurement: Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithm's performance speed was also evaluated. Results: The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution. Conclusions: For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.Keywords
This publication has 25 references indexed in Scilit:
- Evaluating the Risk of Re-identification of Patients from Hospital Prescription RecordsThe Canadian Journal of Hospital Pharmacy, 2009
- Evaluating Predictors of Geographic Area Population Size Cut-offs to Manage Re-identification RiskJournal of the American Medical Informatics Association, 2009
- Protecting Privacy Using k-AnonymityJournal of the American Medical Informatics Association, 2008
- Access to medical records for research purposes: varying perceptions across research ethics boardsJournal of Medical Ethics, 2008
- Utility-based anonymization for privacy preservation with less information lossACM SIGKDD Explorations Newsletter, 2006
- Evaluating Common De-Identification Heuristics for Personal Health InformationJournal of Medical Internet Research, 2006
- HIPAA and Research: How Have the First Two Years Gone?American Journal of Ophthalmology, 2006
- Health Insurance Portability Accountability Act (HIPAA) RegulationsAnnals of Surgery, 2004
- HIPAAEpidemiology, 2003
- The Treatment of Missing Data in Multivariate AnalysisSociological Methods & Research, 1977