Functional analysis of the Escherichia coli genome using the sequence-to-structure-to-function paradigm: identification of proteins exhibiting the Glutaredoxin/Thioredoxin disulfide oxidoreductase activity

2 October 1998

journal article
Published by Elsevier BV in Journal of Molecular Biology

Vol. 282 (4), 703-711
https://doi.org/10.1006/jmbi.1998.2061

Abstract

The application of an automated method for the screening of protein activity based on the sequence-to-structure-to-function paradigm is presented for the complete Escherichia coli genome. First, the structure of the protein is identified from its sequence using a threading algorithm, which aligns the sequences to the best matching structure in a structural database and extends sequence analysis well beyond the limits of local sequence identity. Then, the active site is identified in the resulting sequence-to-structure alignment using a “fuzzy functional form” (FFF), a three-dimensional descriptor of the active site of a protein. Here, this sequence-to-structure-to-function concept is applied to analysis of the complete E. coli genome, i.e. all E. coli open reading frames (ORFs) are screened for the thiol-disulfide oxidoreductase activity of the glutaredoxin/thioredoxin protein family. We show that the method can identify the active sites in ten sequences that are known to or proposed to exhibit this activity. Furthermore, oxidoreductase activity is predicted in two other sequences that have not been identified previously. This method distinguishes protein pairs with similar active sites from proteins pairs that are just topological cousins, i.e. those having similar global folds, but not necessarily similar active sites. Thus, this method provides a novel approach for extraction of active site and functional information based on three-dimensional structures, rather than simple sequence analysis. Prediction of protein activity is fully automated and easily extendible to new functions. Finally, it is demonstrated here that the method can be applied to complete genome database analysis.

Keywords

This publication has 43 references indexed in Scilit:

The Complete Genome Sequence of Escherichia coli K-12
Science, 1997
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Research, 1997
Novel developments with the PRINTS protein fingerprint database
Nucleic Acids Research, 1997
Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii
Science, 1996
The SWISS-PROT protein sequence data bank and its new supplement TREMBL
Nucleic Acids Research, 1996
The PROSITE database, its status in 1995
Nucleic Acids Research, 1996
PRINTS–a protein motif fingerprint database
Protein Engineering, Design and Selection, 1994
A New Five-Year Plan for the U.S. Human Genome Project
Science, 1993
Structural and functional characterization of the mutant Escherichia coli glutaredoxin (C14.fwdarw.S) and its mixed disulfide with glutathione
Biochemistry, 1992
Basic local alignment search tool
Journal of Molecular Biology, 1990

Cited by 85 articles