Assessment of disease named entity recognition on a corpus of annotated sentences

Open Access

11 April 2008

journal article
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 9 (S3), S3
https://doi.org/10.1186/1471-2105-9-s3-s3

Abstract

In recent years, the recognition of semantic types from the biomedical scientific literature has been focused on named entities like protein and gene names (PGNs) and gene ontology terms (GO terms). Other semantic types like diseases have not received the same level of attention. Different solutions have been proposed to identify disease named entities in the scientific literature. While matching the terminology with language patterns suffers from low recall (e.g., Whatizit) other solutions make use of morpho-syntactic features to better cover the full scope of terminological variability (e.g., MetaMap). Currently, MetaMap that is provided from the National Library of Medicine (NLM) is the state of the art solution for the annotation of concepts from UMLS (Unified Medical Language System) in the literature. Nonetheless, its performance has not yet been assessed on an annotated corpus. In addition, little effort has been invested so far to generate an annotated dataset that links disease entities in text to disease entries in a database, thesaurus or ontology and that could serve as a gold standard to benchmark text mining solutions.

Keywords

This publication has 14 references indexed in Scilit:

Combining Evidence, Specificity, and Proximity towards the Normalization of Gene Ontology Terms in Text
EURASIP Journal on Bioinformatics and Systems Biology, 2008
Text processing through Web services: calling Whatizit
Bioinformatics, 2007
Various criteria in the evaluation of biomedical named entity recognition
BMC Bioinformatics, 2006
Text mining and protein annotations: the construction and use of protein description sentences.
2006
EXTRACTION OF GENE-DISEASE RELATIONS FROM MEDLINE USING DOMAIN DICTIONARIES AND MACHINE LEARNING
Published by World Scientific Pub Co Pte Ltd ,2005
Resolving abbreviations to their senses in Medline
Bioinformatics, 2005
Overview of BioCreAtIvE task 1B: normalized gene lists
BMC Bioinformatics, 2005
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text
Bioinformatics, 2005
Gene name ambiguity of eukaryotic nomenclatures
Bioinformatics, 2004
GENIA corpus—a semantically annotated corpus for bio-textmining
Bioinformatics, 2003

Cited by 79 articles