Large-scale identification of coevolution signals across homo-oligomeric protein interfaces by direct coupling analysis
- 13 March 2017
- journal article
- research article
- Published by Proceedings of the National Academy of Sciences in Proceedings of the National Academy of Sciences of the United States of America
- Vol. 114 (13), E2662-E2671
- https://doi.org/10.1073/pnas.1615068114
Abstract
Proteins have evolved to perform diverse cellular functions, from serving as reaction catalysts to coordinating cellular propagation and development. Frequently, proteins do not exert their full potential as monomers but rather undergo concerted interactions as either homo-oligomers or with other proteins as hetero-oligomers. The experimental study of such protein complexes and interactions has been arduous. Theoretical structure prediction methods are an attractive alternative. Here, we investigate homo-oligomeric interfaces by tracing residue coevolution via the global statistical direct coupling analysis (DCA). DCA can accurately infer spatial adjacencies between residues. These adjacencies can be included as constraints in structure prediction techniques to predict high-resolution models. By taking advantage of the ongoing exponential growth of sequence databases, we go significantly beyond anecdotal cases of a few protein families and apply DCA to a systematic large-scale study of nearly 2,000 Pfam protein families with sufficient sequence information and structurally resolved homo-oligomeric interfaces. We find that large interfaces are commonly identified by DCA. We further demonstrate that DCA can differentiate between subfamilies with different binding modes within one large Pfam family. Sequence-derived contact information for the subfamilies proves sufficient to assemble accurate structural models of the diverse protein-oligomers. Thus, we provide an approach to investigate oligomerization for arbitrary protein families leading to structural models complementary to often-difficult experimental methods. Combined with ever more abundant sequential data, we anticipate that this study will be instrumental to allow the structural description of many heteroprotein complexes in the future.Keywords
Funding Information
- Agence Nationale de la Recherche (ANR-13-BS04-0012-01)
- HHS | NIH | National Institute of General Medical Sciences (GM106085)
This publication has 44 references indexed in Scilit:
- Coevolutionary Landscape Inference and the Context-Dependence of Mutations in Beta-Lactamase TEM-1Molecular Biology and Evolution, 2015
- Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selectionProceedings of the National Academy of Sciences of the United States of America, 2014
- Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich eraProceedings of the National Academy of Sciences of the United States of America, 2013
- Emerging methods in protein co-evolutionNature Reviews Genetics, 2013
- Genomics-aided structure predictionProceedings of the National Academy of Sciences of the United States of America, 2012
- Protein 3D Structure Computed from Evolutionary Sequence VariationPLOS ONE, 2011
- Direct-coupling analysis of residue coevolution captures native contacts across many protein familiesProceedings of the National Academy of Sciences of the United States of America, 2011
- PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignmentsBioinformatics, 2011
- From protein folding to protein function and biomolecular binding by energy landscape theoryCurrent Opinion in Pharmacology, 2010
- Identification of direct residue contacts in protein–protein interaction by message passingProceedings of the National Academy of Sciences of the United States of America, 2009