Residues crucial for maintaining short paths in network communication mediate signaling in proteins

Abstract
Here, we represent protein structures as residue interacting networks, which are assumed to involve a permanent flow of information between amino acids. By removal of nodes from the protein network, we identify fold centrally conserved residues, which are crucial for sustaining the shortest pathways and thus play key roles in long‐range interactions. Analysis of seven protein families (myoglobins, G‐protein‐coupled receptors, the trypsin class of serine proteases, hemoglobins, oligosaccharide phosphorylases, nuclear receptor ligand‐binding domains and retroviral proteases) confirms that experimentally many of these residues are important for allosteric communication. The agreement between the centrally conserved residues, which are key in preserving short path lengths, and residues experimentally suggested to mediate signaling further illustrates that topology plays an important role in network communication. Protein folds have evolved under constraints imposed by function. To maintain function, protein structures need to be robust to mutational events. On the other hand, robustness is accompanied by an extreme sensitivity at some crucial sites. Thus, here we propose that centrally conserved residues, whose removal increases the characteristic path length in protein networks, may relate to the system fragility. ### Synopsis Evolution of protein fold is determined by the constraints imposed by its function. An important characteristic for maintaining function is the robustness of protein structures to mutagenesis allowing a level of sequence plasticity. This robustness is accompanied by an extreme sensitivity to mutations at some sites. It has been shown that protein structures can be represented as small‐world networks of interactions between amino acids, with residues corresponding to vertices and contacts between them representing the edges ([Greene and Higman, 2003][1]). These networks are usually highly clustered with a few links connecting any pair of nodes ([Watts and Strogatz, 1998][2]). Consequently, there are relatively few residues interconnecting all residues in the structure. Although protein structures are robust complex systems, they are also fragile to perturbations at key positions ([Taverna and Goldstein, 2002][3]). Experimental studies show that a significant number of single‐site mutations have little effect on the protein function, whereas perturbations of key amino acids can abolish protein activity or folding. This robustness is expected to be an intrinsic characteristic of the protein fold. Viewing protein structures as information processing networks, where the communicated information can be transmitted in a physical (or chemical) form, it would be reasonable to assume that certain amino acids are crucial for network communications. Residues receiving and propagating information are expected to be central in the interaction network, lying on the shortest pathways between most residue pairs in the protein. Although the propagation of the information in protein structures is poorly understood, a number of theoretical results have suggested the crucial role of the central residues ([Dokholyan et al , 2002][4]; [Vendruscolo et al , 2002][5]; [Amitai et al , 2004][6]; [del Sol and O'Meara, 2004][7]). Allostery is based on communication and transmission of information from one functional site to another. Using our network representation of protein structures, removal of most vertices (amino acids) with their corresponding edges does not affect substantially the network's interconnectedness expressed by the average of the shortest path distance between all pairs of vertices. On the other hand, removal of fold centrally conserved residues (including their links) affects significantly the network's interconnectedness, suggesting that these residues are crucial in preserving short path lengths. We termed these key amino acids ‘interconnectivity determinants' (ICD). We studied seven allosteric protein families with experimental information on key residues in allosteric communications (myoglobins, G‐protein‐coupled receptors, the trypsin class of serine proteases, hemoglobins, oligosaccharide phosphorylases, nuclear receptor ligand‐binding domains and retroviral proteases). In each case, based on the protein family structural alignment, we determined the ICDs in the structures of most family members (we termed these positions ‘conserved interconnectivity determinants' or CICD residues; [Figure 2][8]). Our results revealed a general correspondence between the CICDs and experimentally annotated key residues for allosteric communications. Interestingly, some of the CICD residues in four of the analyzed examples (G‐protein‐coupled receptors, the trypsin class of serine proteases, hemoglobins and nuclear receptor ligand‐binding domains) were found to be amino acids involved in the networks of statistically coupled residues as predicted by Ranganathan and co‐workers ([Süel et al , 2002][9]). Thus, our findings show that CICD residues, that is, centrally conserved residues crucial for maintaining shorter path lengths in the protein network, mediate the signaling process in protein families, illustrating that topology plays an important role in network communication. The myoglobin family deserves special attention owing to the recent findings on the allosteric nature of myoglobin. This protein illustrates that certain characteristics of a protein design may be involved in new functions. Interestingly, all the key residues whose removal significantly elongates the path length in the network correspond to either residues binding the heme group, amino acids lining three of the main xenon cavities and thus likely to be important for the myoglobin allostery or to redox‐active residues, which act in a cooperative way for optimal protein function. The HIV‐1 protease is also...