General strategies for using amino acid sequence data to guide biochemical investigation of protein function

23 November 2022

journal article
research article
Published by Portland Press Ltd. in Biochemical Society Transactions

Vol. 50 (6), 1847-1858
https://doi.org/10.1042/bst20220849

Abstract

The rapid increase of ‘-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.

Keywords

This publication has 101 references indexed in Scilit:

Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era
Proceedings of the National Academy of Sciences of the United States of America, 2013
Consequences of domain insertion on sequence-structure divergence in a superfold
Proceedings of the National Academy of Sciences of the United States of America, 2013
New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures
Nucleic Acids Research, 2012
Direct-coupling analysis of residue coevolution captures native contacts across many protein families
Proceedings of the National Academy of Sciences of the United States of America, 2011
Receiver domain structure and function in response regulator proteins
Current Opinion in Microbiology, 2010
Interaction fidelity in two-component signaling
Current Opinion in Microbiology, 2010
Protein Sectors: Evolutionary Units of Three-Dimensional Structure
Cell, 2009
A Novel “Four-component” Two-component Signal Transduction Mechanism Regulates Developmental Progression in Myxococcus xanthus
Online Journal of Public Health Informatics, 2009
Identification of direct residue contacts in protein–protein interaction by message passing
Proceedings of the National Academy of Sciences of the United States of America, 2009
Rewiring the Specificity of Two-Component Signal Transduction Systems
Cell, 2008