Automatic de-identification of textual documents in the electronic health record: a review of recent research

Open Access

2 August 2010

journal article
review article
Published by Springer Science and Business Media LLC in BMC Medical Research Methodology

Vol. 10 (1), 70
https://doi.org/10.1186/1471-2288-10-70

Abstract

In the United States, the Health Insurance Portability and Accountability Act (HIPAA) protects the confidentiality of patient data and requires the informed consent of the patient and approval of the Internal Review Board to use data for research purposes, but these requirements can be waived if data is de-identified. For clinical data to be considered de-identified, the HIPAA "Safe Harbor" technique requires 18 data elements (called PHI: Protected Health Information) to be removed. The de-identification of narrative text documents is often realized manually, and requires significant resources. Well aware of these issues, several authors have investigated automated de-identification of narrative text documents from the electronic health record, and a review of recent research in this domain is presented here.

Keywords

This publication has 23 references indexed in Scilit:

Recognizing Obesity and Comorbidities in Sparse Data
Journal of the American Medical Informatics Association, 2009
Repurposing the Clinical Record: Can an Existing Natural Language Processing System De-identify Clinical Notes?
Journal of the American Medical Informatics Association, 2009
A Software Tool for Removing Patient Identifying Information from Clinical Documents
Journal of the American Medical Informatics Association, 2008
Automated de-identification of free-text medical records
BMC Medical Informatics and Decision Making, 2008
A de-identifier for medical discharge summaries
Artificial Intelligence in Medicine, 2008
Identifying Patient Smoking Status from Medical Discharge Records
Journal of the American Medical Informatics Association, 2008
Evaluating the State-of-the-Art in Automatic De-identification
Journal of the American Medical Informatics Association, 2007
Rapidly Retargetable Approaches to De-identification in Medical Records
Journal of the American Medical Informatics Association, 2007
State-of-the-art Anonymization of Medical Records Using an Iterative Machine Learning Framework
Journal of the American Medical Informatics Association, 2007
Evaluation of a Deidentification (De-Id) Software Engine to Share Pathology Reports and Clinical Documents for Research
American Journal of Clinical Pathology, 2004

Cited by 205 articles