Quantitative assessment of protein function prediction from metagenomics shotgun sequences
- 28 August 2007
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences
- Vol. 104 (35), 13913-13918
- https://doi.org/10.1073/pnas.0702636104
Abstract
To assess the potential of protein function prediction in environmental genomics data, we analyzed shotgun sequences from four diverse and complex habitats. Using homology searches as well as customized gene neighborhood methods that incorporate intergenic and evolutionary distances, we inferred specific functions for 76% of the 1.4 million predicted ORFs in these samples (83% when nonspecific functions are considered). Surprisingly, these fractions are only slightly smaller than the corresponding ones in completely sequenced genomes (83% and 86%, respectively, by using the same methodology) and considerably higher than previously thought. For as many as 75,448 ORFs (5% of the total), only neighborhood methods can assign functions, illustrated here by a previously undescribed gene associated with the well characterized heme biosynthesis operon and a potential transcription factor that might regulate a coupling between fatty acid biosynthesis and degradation. Our results further suggest that, although functions can be inferred for most proteins on earth, many functions remain to be discovered in numerous small, rare protein families.Keywords
This publication has 47 references indexed in Scilit:
- An obesity-associated gut microbiome with increased capacity for energy harvestNature, 2006
- Metagenomic Analysis of the Human Distal Gut MicrobiomeScience, 2006
- Environmental Genome Shotgun Sequencing of the Sargasso SeaScience, 2004
- Community structure and metabolism through reconstruction of microbial genomes from the environmentNature, 2004
- The Pfam protein families databaseNucleic Acids Research, 2004
- An efficient algorithm for large-scale detection of protein familiesNucleic Acids Research, 2002
- Detecting Protein Function and Protein-Protein Interactions from Genome SequencesScience, 1999
- Predicting function: from genes to genomes and backJournal of Molecular Biology, 1998
- Volume changes in protein evolutionJournal of Molecular Biology, 1994
- Basic local alignment search toolJournal of Molecular Biology, 1990