Assessment of disease named entity recognition on a corpus of annotated sentences
Open Access
- 11 April 2008
- journal article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 9 (S3), S3
- https://doi.org/10.1186/1471-2105-9-s3-s3
Abstract
In recent years, the recognition of semantic types from the biomedical scientific literature has been focused on named entities like protein and gene names (PGNs) and gene ontology terms (GO terms). Other semantic types like diseases have not received the same level of attention. Different solutions have been proposed to identify disease named entities in the scientific literature. While matching the terminology with language patterns suffers from low recall (e.g., Whatizit) other solutions make use of morpho-syntactic features to better cover the full scope of terminological variability (e.g., MetaMap). Currently, MetaMap that is provided from the National Library of Medicine (NLM) is the state of the art solution for the annotation of concepts from UMLS (Unified Medical Language System) in the literature. Nonetheless, its performance has not yet been assessed on an annotated corpus. In addition, little effort has been invested so far to generate an annotated dataset that links disease entities in text to disease entries in a database, thesaurus or ontology and that could serve as a gold standard to benchmark text mining solutions.Keywords
This publication has 14 references indexed in Scilit:
- Combining Evidence, Specificity, and Proximity towards the Normalization of Gene Ontology Terms in TextEURASIP Journal on Bioinformatics and Systems Biology, 2008
- Text processing through Web services: calling WhatizitBioinformatics, 2007
- Various criteria in the evaluation of biomedical named entity recognitionBMC Bioinformatics, 2006
- Text mining and protein annotations: the construction and use of protein description sentences.2006
- EXTRACTION OF GENE-DISEASE RELATIONS FROM MEDLINE USING DOMAIN DICTIONARIES AND MACHINE LEARNINGPublished by World Scientific Pub Co Pte Ltd ,2005
- Resolving abbreviations to their senses in MedlineBioinformatics, 2005
- Overview of BioCreAtIvE task 1B: normalized gene listsBMC Bioinformatics, 2005
- ABNER: an open source tool for automatically tagging genes, proteins and other entity names in textBioinformatics, 2005
- Gene name ambiguity of eukaryotic nomenclaturesBioinformatics, 2004
- GENIA corpus—a semantically annotated corpus for bio-textminingBioinformatics, 2003