General strategies for using amino acid sequence data to guide biochemical investigation of protein function
- 23 November 2022
- journal article
- research article
- Published by Portland Press Ltd. in Biochemical Society Transactions
- Vol. 50 (6), 1847-1858
- https://doi.org/10.1042/bst20220849
Abstract
The rapid increase of ‘-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.Keywords
This publication has 101 references indexed in Scilit:
- Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich eraProceedings of the National Academy of Sciences of the United States of America, 2013
- Consequences of domain insertion on sequence-structure divergence in a superfoldProceedings of the National Academy of Sciences of the United States of America, 2013
- New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structuresNucleic Acids Research, 2012
- Direct-coupling analysis of residue coevolution captures native contacts across many protein familiesProceedings of the National Academy of Sciences of the United States of America, 2011
- Receiver domain structure and function in response regulator proteinsCurrent Opinion in Microbiology, 2010
- Interaction fidelity in two-component signalingCurrent Opinion in Microbiology, 2010
- Protein Sectors: Evolutionary Units of Three-Dimensional StructureCell, 2009
- A Novel “Four-component” Two-component Signal Transduction Mechanism Regulates Developmental Progression in Myxococcus xanthusOnline Journal of Public Health Informatics, 2009
- Identification of direct residue contacts in protein–protein interaction by message passingProceedings of the National Academy of Sciences of the United States of America, 2009
- Rewiring the Specificity of Two-Component Signal Transduction SystemsCell, 2008