A Globally Optimal k-Anonymity Method for the De-Identification of Health Data

Open Access

1 September 2009

journal article
research article
Published by Oxford University Press (OUP) in Journal of the American Medical Informatics Association

Vol. 16 (5), 670-682
https://doi.org/10.1197/jamia.m3144

Abstract

Background: Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified. Objective: The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets. Design: Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits. Measurement: Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithm's performance speed was also evaluated. Results: The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution. Conclusions: For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.

Keywords

This publication has 25 references indexed in Scilit:

Evaluating the Risk of Re-identification of Patients from Hospital Prescription Records
The Canadian Journal of Hospital Pharmacy, 2009
Evaluating Predictors of Geographic Area Population Size Cut-offs to Manage Re-identification Risk
Journal of the American Medical Informatics Association, 2009
Protecting Privacy Using k-Anonymity
Journal of the American Medical Informatics Association, 2008
Access to medical records for research purposes: varying perceptions across research ethics boards
Journal of Medical Ethics, 2008
Utility-based anonymization for privacy preservation with less information loss
ACM SIGKDD Explorations Newsletter, 2006
Evaluating Common De-Identification Heuristics for Personal Health Information
Journal of Medical Internet Research, 2006
HIPAA and Research: How Have the First Two Years Gone?
American Journal of Ophthalmology, 2006
Health Insurance Portability Accountability Act (HIPAA) Regulations
Annals of Surgery, 2004
HIPAA
Epidemiology, 2003
The Treatment of Missing Data in Multivariate Analysis
Sociological Methods & Research, 1977

Cited by 166 articles