Database Citation in Full Text Biomedical Articles

Open Access

29 May 2013

journal article
research article
Published by Public Library of Science (PLoS) in PLOS ONE

Vol. 8 (5), e63184
https://doi.org/10.1371/journal.pone.0063184

Abstract

Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services.

Keywords

This publication has 12 references indexed in Scilit:

Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE
Database: The Journal of Biological Databases and Curation, 2012
Extraction of data deposition statements from the literature: a method for automatically tracking research results
Bioinformatics, 2011
Annotating genes and genomes with DNA sequences extracted from biomedical articles
Bioinformatics, 2011
UKPMC: a full text article resource for the life sciences
Nucleic Acids Research, 2010
The Pfam protein families database
Nucleic Acids Research, 2009
BioLit: integrating biological literature with databases
Nucleic Acids Research, 2008
Text processing through Web services: calling Whatizit
Bioinformatics, 2007
ArrayExpress--a public repository for microarray gene expression data at the EBI
Nucleic Acids Research, 2004
The InterPro database, an integrated documentation resource for protein families, domains and functional sites
Nucleic Acids Research, 2001
The EMBL data library
Nucleic Acids Research, 1988

Cited by 32 articles