A deep database of medical abbreviations and acronyms for natural language processing
Open Access
- 2 June 2021
- journal article
- research article
- Published by Springer Science and Business Media LLC in Scientific Data
- Vol. 8 (1), 1-9
- https://doi.org/10.1038/s41597-021-00929-4
Abstract
The recognition, disambiguation, and expansion of medical abbreviations and acronyms is of upmost importance to prevent medically-dangerous misinterpretation in natural language processing. To support recognition, disambiguation, and expansion, we present the Medical Abbreviation and Acronym Meta-Inventory, a deep database of medical abbreviations. A systematic harmonization of eight source inventories across multiple healthcare specialties and settings identified 104,057 abbreviations with 170,426 corresponding senses. Automated cross-mapping of synonymous records using state-of-the-art machine learning reduced redundancy, which simplifies future application. Additional features include semi-automated quality control to remove errors. The Meta-Inventory demonstrated high completeness or coverage of abbreviations and senses in new clinical text, a substantial improvement over the next largest repository (6–14% increase in abbreviation coverage; 28–52% increase in sense coverage). To our knowledge, the Meta-Inventory is the most complete compilation of medical abbreviations and acronyms in American English to-date. The multiple sources and high coverage support application in varied specialties and settings. This allows for cross-institutional natural language processing, which previous inventories did not support. The Meta-Inventory is available at https://bit.ly/github-clinical-abbreviations.Funding Information
- U.S. Department of Health & Human Services | NIH | U.S. National Library of Medicine (F31LM013054)
This publication has 45 references indexed in Scilit:
- Natural language processing: an introductionJournal of the American Medical Informatics Association, 2011
- A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summariesJournal of the American Medical Informatics Association, 2011
- Automated Identification of Postoperative Complications Within an Electronic Medical Record Using Natural Language ProcessingJAMA, 2011
- Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguationBMC Bioinformatics, 2011
- What can natural language processing do for clinical decision support?Journal of Biomedical Informatics, 2009
- Methods for Building Sense Inventories of Abbreviations in Clinical NotesJournal of the American Medical Informatics Association, 2009
- Word sense disambiguation across two domains: Biomedical literature and clinical notesJournal of Biomedical Informatics, 2008
- ADAM: another database of abbreviations in MEDLINEBioinformatics, 2006
- In defense of the DesiderataJournal of Biomedical Informatics, 2006
- The Unified Medical Language System (UMLS): integrating biomedical terminologyNucleic Acids Research, 2004