Abbreviation definition identification based on automatic precision estimates

Open Access

25 September 2008

journal article
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 9 (1), 402
https://doi.org/10.1186/1471-2105-9-402

Abstract

The rapid growth of biomedical literature presents challenges for automatic text processing, and one of the challenges is abbreviation identification. The presence of unrecognized abbreviations in text hinders indexing algorithms and adversely affects information retrieval and extraction. Automatic abbreviation definition identification can help resolve these issues. However, abbreviations and their definitions identified by an automatic process are of uncertain validity. Due to the size of databases such as MEDLINE only a small fraction of abbreviation-definition pairs can be examined manually. An automatic way to estimate the accuracy of abbreviation-definition pairs extracted from text is needed. In this paper we propose an abbreviation definition identification algorithm that employs a variety of strategies to identify the most probable abbreviation definition. In addition our algorithm produces an accuracy estimate, pseudo-precision, for each strategy without using a human-judged gold standard. The pseudo-precisions determine the order in which the algorithm applies the strategies in seeking to identify the definition of an abbreviation.

Keywords

This publication has 14 references indexed in Scilit:

ADAM: another database of abbreviations in MEDLINE
Bioinformatics, 2006
MedPost: a part-of-speech tagger for bioMedical text
Bioinformatics, 2004
MINING TERMINOLOGICAL KNOWLEDGE IN LARGE BIOMEDICAL CORPORA
Pacific Symposium on Biocomputing, 2002
A SIMPLE ALGORITHM FOR IDENTIFYING ABBREVIATION DEFINITIONS IN BIOMEDICAL TEXT
Pacific Symposium on Biocomputing, 2002
Creating an Online Dictionary of Abbreviations from MEDLINE
Journal of the American Medical Informatics Association, 2002
Mapping Abbreviations to Full Forms in Biomedical Articles
Journal of the American Medical Informatics Association, 2002
Automatic extraction of acronym-meaning pairs from MEDLINE databases.
2001
Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program.
2001
A broad-coverage natural language processing system.
2000
Acronyms of clinical trials in cardiology—1998
American Heart Journal, 1999

Cited by 86 articles