A comparative study of current Clinical Natural Language Processing systems on handling abbreviations in discharge summaries.

3 November 2012

journal article
research article

Vol. 2012, 997-1003

Abstract

Clinical Natural Language Processing (NLP) systems extract clinical information from narrative clinical texts in many settings. Previous research mentions the challenges of handling abbreviations in clinical texts, but provides little insight into how well current NLP systems correctly recognize and interpret abbreviations. In this paper, we compared performance of three existing clinical NLP systems in handling abbreviations: MetaMap, MedLEE, and cTAKES. The evaluation used an expert-annotated gold standard set of clinical documents (derived from from 32 de-identified patient discharge summaries) containing 1,112 abbreviations. The existing NLP systems achieved suboptimal performance in abbreviation identification, with F-scores ranging from 0.165 to 0.601. MedLEE achieved the best F-score of 0.601 for all abbreviations and 0.705 for clinically relevant abbreviations. This study suggested that accurate identification of clinical abbreviations is a challenging task and that more advanced abbreviation recognition modules might improve existing clinical NLP systems.

This publication has 32 references indexed in Scilit:

Knowledge-based biomedical word sense disambiguation: comparison of approaches
BMC Bioinformatics, 2010
Discovering peripheral arterial disease cases from radiology notes using natural language processing.
2010
Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications
Journal of the American Medical Informatics Association, 2010
An overview of MetaMap: historical perspective and recent advances
Journal of the American Medical Informatics Association, 2010
Automated evaluation of electronic discharge notes to assess quality of care for cardiovascular diseases using Medical Language Extraction and Encoding System (MedLEE)
Journal of the American Medical Informatics Association, 2010
Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research
Yearbook of Medical Informatics, 2008
Development of a Large-Scale De-Identified DNA Biobank to Enable Personalized Medicine
Clinical Pharmacology & Therapeutics, 2008
Automated Acquisition of Disease-Drug Knowledge from Biomedical and Clinical Documents: An Initial Study
Journal of the American Medical Informatics Association, 2008
A study of abbreviations in clinical notes.
2007
Natural language processing to extract medical problems from electronic clinical documents: Performance evaluation
Journal of Biomedical Informatics, 2006

Cited by 27 articles