Chemical Entity Recognition and Resolution to ChEBI

Open Access

15 February 2012

journal article
Published by Hindawi Limited in ISRN Bioinformatics

Vol. 2012, 1-9
https://doi.org/10.5402/2012/619427

Abstract

Chemical entities are ubiquitous through the biomedical literature and the development of text-mining systems that can efficiently identify those entities are required. Due to the lack of available corpora and data resources, the community has focused its efforts in the development of gene and protein named entity recognition systems, but with the release of ChEBI and the availability of an annotated corpus, this task can be addressed. We developed a machine-learning-based method for chemical entity recognition and a lexical-similarity-based method for chemical entity resolution and compared them with Whatizit, a popular-dictionary-based method. Our methods outperformed the dictionary-based method in all tasks, yielding an improvement in F-measure of 20% for the entity recognition task, 2–5% for the entity-resolution task, and 15% for combined entity recognition and resolution tasks.

Keywords

Funding Information

European Commission (231807)

This publication has 21 references indexed in Scilit:

Semantic Similarity for Automatic Classification of Chemical Compounds
PLoS Computational Biology, 2010
Cascaded classifiers for confidence-based chemical named entity recognition
BMC Bioinformatics, 2008
Linking genes to literature: text mining, information extraction, and retrieval applications for biology
Genome Biology, 2008
Overview of BioCreative II gene mention recognition
Genome Biology, 2008
Detection of IUPAC and IUPAC-like chemical names
Bioinformatics, 2008
Overview of BioCreative II gene normalization
Genome Biology, 2008
ChEBI: a database and ontology for chemical entities of biological interest
Nucleic Acids Research, 2007
Frontiers of biomedical text mining: current progress
Briefings in Bioinformatics, 2007
A scalable machine-learning approach to recognize chemical names within large text databases
BMC Bioinformatics, 2006
Term identification in the biomedical literature
Journal of Biomedical Informatics, 2004

Cited by 13 articles