Homology-based inference sets the bar high for protein function prediction

Open Access

28 February 2013

journal article
Published by Springer Science and Business Media LLC in BMC Bioinformatics

Vol. 14 (S3), S7
https://doi.org/10.1186/1471-2105-14-s3-s7

Abstract

Any method that de novo predicts protein function should do better than random. More challenging, it also ought to outperform simple homology-based inference. Here, we describe a few methods that predict protein function exclusively through homology. Together, they set the bar or lower limit for future improvements. During the development of these methods, we faced two surprises. Firstly, our most successful implementation for the baseline ranked very high at CAFA1. In fact, our best combination of homology-based methods fared only slightly worse than the top-of-the-line prediction method from the Jones group. Secondly, although the concept of homology-based inference is simple, this work revealed that the precise details of the implementation are crucial: not only did the methods span from top to bottom performers at CAFA, but also the reasons for these differences were unexpected. In this work, we also propose a new rigorous measure to compare predicted and experimental annotations. It puts more emphasis on the details of protein function than the other measures employed by CAFA and may best reflect the expectations of users. Clearly, the definition of proper goals remains one major objective for CAFA.

Keywords

This publication has 16 references indexed in Scilit:

A large-scale evaluation of computational protein function prediction
Nature Methods, 2013
Analysis of protein function and its prediction from amino acid sequence
Proteins-Structure Function and Bioinformatics, 2011
Ongoing and future developments at the Universal Protein Resource
Nucleic Acids Research, 2010
ESG: extended similarity group method for automated protein function prediction
Bioinformatics, 2009
Protein function prediction – the power of multiplicity
Trends in Biotechnology, 2009
GOSLING: a rule-based protein annotator using BLAST and GO
Bioinformatics, 2008
ConFunc—functional annotation in the twilight zone
Bioinformatics, 2008
Enhanced automated function prediction using distantly related sequences and contextual association by PFP
Protein Science, 2006
Reliability of Assessment of Protein Structure Prediction Methods
Structure, 2002
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997

Cited by 36 articles