Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the Glutaredoxin/Thioredoxin disulfide oxidoreductase activity
- 2 October 1998
- journal article
- Published by Elsevier BV in Journal of Molecular Biology
- Vol. 282 (4), 703-711
- https://doi.org/10.1006/jmbi.1998.2061
Abstract
The application of an automated method for the screening of protein activity based on the sequence-to-structure-to-function paradigm is presented for the complete Escherichia coli genome. First, the structure of the protein is identified from its sequence using a threading algorithm, which aligns the sequences to the best matching structure in a structural database and extends sequence analysis well beyond the limits of local sequence identity. Then, the active site is identified in the resulting sequence-to-structure alignment using a “fuzzy functional form” (FFF), a three-dimensional descriptor of the active site of a protein. Here, this sequence-to-structure-to-function concept is applied to analysis of the complete E. coli genome, i.e. all E. coli open reading frames (ORFs) are screened for the thiol-disulfide oxidoreductase activity of the glutaredoxin/thioredoxin protein family. We show that the method can identify the active sites in ten sequences that are known to or proposed to exhibit this activity. Furthermore, oxidoreductase activity is predicted in two other sequences that have not been identified previously. This method distinguishes protein pairs with similar active sites from proteins pairs that are just topological cousins, i.e. those having similar global folds, but not necessarily similar active sites. Thus, this method provides a novel approach for extraction of active site and functional information based on three-dimensional structures, rather than simple sequence analysis. Prediction of protein activity is fully automated and easily extendible to new functions. Finally, it is demonstrated here that the method can be applied to complete genome database analysis.Keywords
This publication has 43 references indexed in Scilit:
- The Complete Genome Sequence of Escherichia coli K-12Science, 1997
- Gapped BLAST and PSI-BLAST: a new generation of protein database search programsNucleic Acids Research, 1997
- Novel developments with the PRINTS protein fingerprint databaseNucleic Acids Research, 1997
- Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii Science, 1996
- The SWISS-PROT protein sequence data bank and its new supplement TREMBLNucleic Acids Research, 1996
- The PROSITE database, its status in 1995Nucleic Acids Research, 1996
- PRINTS–a protein motif fingerprint databaseProtein Engineering, Design and Selection, 1994
- A New Five-Year Plan for the U.S. Human Genome ProjectScience, 1993
- Structural and functional characterization of the mutant Escherichia coli glutaredoxin (C14.fwdarw.S) and its mixed disulfide with glutathioneBiochemistry, 1992
- Basic local alignment search toolJournal of Molecular Biology, 1990