Validating Annotations for Uncharacterized Proteins in Shewanella oneidensis

1 September 2008

journal article
research article
Published by Mary Ann Liebert Inc in OMICS: A Journal of Integrative Biology

Vol. 12 (3), 211-215
https://doi.org/10.1089/omi.2008.0051

Abstract

Proteins of unknown function are a barrier to our understanding of molecular biology. Assigning function to these “uncharacterized” proteins is imperative, but challenging. The usual approach is similarity searches using annotation databases, which are useful for predicting function. However, since the performance of these databases on uncharacterized proteins is basically unknown, the accuracy of their predictions is suspect, making annotation difficult. To address this challenge, we developed a benchmark annotation dataset of 30 proteins in Shewanella oneidensis. The proteins in the dataset were originally uncharacterized after the initial annotation of the S. oneidensis proteome in 2002. In the intervening 5 years, the accumulation of new experimental evidence has enabled specific functions to be predicted. We utilized this benchmark dataset to evaluate several commonly utilized annotation databases. According to our criteria, six annotation databases accurately predicted functions for at least 60% of proteins in our dataset. Two of these six even had a “conditional accuracy” of 90%. Conditional accuracy is another evaluation metric we developed which excludes results from databases where no function was predicted. Also, 27 of the 30 proteins' functions were correctly predicted by at least one database. These represent one of the first performance evaluations of annotation databases on uncharacterized proteins. Our evaluation indicates that these databases readily incorporate new information and are accurate in predicting functions for uncharacterized proteins, provided that experimental function evidence exists.

Keywords

This publication has 24 references indexed in Scilit:

The Universal Protein Resource (UniProt)
Nucleic Acids Research, 2006
Automatic annotation of protein function
Current Opinion in Structural Biology, 2005
Global profiling of Shewanella oneidensis MR-1: Expression of hypothetical genes and improved functional annotations
Proceedings of the National Academy of Sciences, 2005
The TIGRFAMs database of protein families
Nucleic Acids Research, 2003
Genome sequence of the dissimilatory metal ion–reducing bacterium Shewanella oneidensis
Nature Biotechnology, 2002
BRENDA, enzyme data and metabolic information
Nucleic Acids Research, 2002
CDD: a database of conserved domain alignments with links to domain three-dimensional structure
Nucleic Acids Research, 2002
Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure
Journal of Molecular Biology, 2001
Intrinsic errors in genome annotation
Trends in Genetics, 2001
Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships
Proceedings of the National Academy of Sciences, 1998

Cited by 4 articles