Validating Annotations for Uncharacterized Proteins in Shewanella oneidensis
- 1 September 2008
- journal article
- research article
- Published by Mary Ann Liebert Inc in OMICS: A Journal of Integrative Biology
- Vol. 12 (3), 211-215
- https://doi.org/10.1089/omi.2008.0051
Abstract
Proteins of unknown function are a barrier to our understanding of molecular biology. Assigning function to these “uncharacterized” proteins is imperative, but challenging. The usual approach is similarity searches using annotation databases, which are useful for predicting function. However, since the performance of these databases on uncharacterized proteins is basically unknown, the accuracy of their predictions is suspect, making annotation difficult. To address this challenge, we developed a benchmark annotation dataset of 30 proteins in Shewanella oneidensis. The proteins in the dataset were originally uncharacterized after the initial annotation of the S. oneidensis proteome in 2002. In the intervening 5 years, the accumulation of new experimental evidence has enabled specific functions to be predicted. We utilized this benchmark dataset to evaluate several commonly utilized annotation databases. According to our criteria, six annotation databases accurately predicted functions for at least 60% of proteins in our dataset. Two of these six even had a “conditional accuracy” of 90%. Conditional accuracy is another evaluation metric we developed which excludes results from databases where no function was predicted. Also, 27 of the 30 proteins' functions were correctly predicted by at least one database. These represent one of the first performance evaluations of annotation databases on uncharacterized proteins. Our evaluation indicates that these databases readily incorporate new information and are accurate in predicting functions for uncharacterized proteins, provided that experimental function evidence exists.Keywords
This publication has 24 references indexed in Scilit:
- The Universal Protein Resource (UniProt)Nucleic Acids Research, 2006
- Automatic annotation of protein functionCurrent Opinion in Structural Biology, 2005
- Global profiling of Shewanella oneidensis MR-1: Expression of hypothetical genes and improved functional annotationsProceedings of the National Academy of Sciences, 2005
- The TIGRFAMs database of protein familiesNucleic Acids Research, 2003
- Genome sequence of the dissimilatory metal ion–reducing bacterium Shewanella oneidensisNature Biotechnology, 2002
- BRENDA, enzyme data and metabolic informationNucleic Acids Research, 2002
- CDD: a database of conserved domain alignments with links to domain three-dimensional structureNucleic Acids Research, 2002
- Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structureJournal of Molecular Biology, 2001
- Intrinsic errors in genome annotationTrends in Genetics, 2001
- Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationshipsProceedings of the National Academy of Sciences, 1998