Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II

Open Access

1 January 2012

journal article
research article
Published by Oxford University Press (OUP) in Database: The Journal of Biological Databases and Curation

Vol. 2012, bas043
https://doi.org/10.1093/database/bas043

Abstract

Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To close this gap and better understand all aspects of literature curation, we invited submissions of written descriptions of curation workflows from expert curated databases for the BioCreative 2012 Workshop Track II. We received seven qualified contributions, primarily from model organism databases. Based on these descriptions, we identified commonalities and differences across the workflows, the common ontologies and controlled vocabularies used and the current and desired uses of text mining for biocuration. Compared to a survey done in 2009, our 2012 results show that many more databases are now using text mining in parts of their curation workflows. In addition, the workshop participants identified text-mining aids for finding gene names and symbols (gene indexing), prioritization of documents for curation (document triage) and ontology concept assignment as those most desired by the biocurators. Database URL:http://www.biocreative.org/tasks/bc-workshop-2012/workflow/

Keywords

This publication has 30 references indexed in Scilit:

BioCreative-2012 Virtual Issue
Database: The Journal of Biological Databases and Curation, 2012
Semi-automatic semantic annotation of PubMed queries: A study on quality, efficiency, satisfaction
Journal of Biomedical Informatics, 2010
An Overview of BioCreative II.5
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2010
Overview of BioCreative II gene mention recognition
Genome Biology, 2008
Overview of the protein-protein interaction annotation extraction task of BioCreative II
Genome Biology, 2008
Overview of BioCreative II gene normalization
Genome Biology, 2008
INTEGRATING NATURAL LANGUAGE PROCESSING WITH FLYBASE CURATION
Pacific Symposium on Biocomputing, 2006
Overview of BioCreAtIvE: critical assessment of information extraction for biology
BMC Bioinformatics, 2005
Overview of BioCreAtIvE task 1B: normalized gene lists
BMC Bioinformatics, 2005
Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature
PLoS Biology, 2004

Cited by 64 articles