Identifying Cognate Binding Pairs among a Large Set of Paralogs: The Case of PE/PPE Proteins of Mycobacterium tuberculosis
Open Access
- 12 September 2008
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 4 (9), e1000174
- https://doi.org/10.1371/journal.pcbi.1000174
Abstract
We consider the problem of how to detect cognate pairs of proteins that bind when each belongs to a large family of paralogs. To illustrate the problem, we have undertaken a genomewide analysis of interactions of members of the PE and PPE protein families of Mycobacterium tuberculosis. Our computational method uses structural information, operon organization, and protein coevolution to infer the interaction of PE and PPE proteins. Some 289 PE/PPE complexes were predicted out of a possible 5,590 PE/PPE pairs genomewide. Thirty-five of these predicted complexes were also found to have correlated mRNA expression, providing additional evidence for these interactions. We show that our method is applicable to other protein families, by analyzing interactions of the Esx family of proteins. Our resulting set of predictions is a starting point for genomewide experimental interaction screens of the PE and PPE families, and our method may be generally useful for detecting interactions of proteins within families having many paralogs. We consider the problem of detecting protein interactions from genome sequences when the potential interacting partners belong to large families of similar (homologous) proteins. Many computational methods for predicting protein interactions rely on similarity to a pair of known interacting proteins. When the proteins in question are members of large groups of similar proteins within the same organism (paralogs), the problem of inferring the correct interactions becomes difficult. To illustrate the problem, we undertook prediction of interactions of some highly expanded protein families of Mycobacterium tuberculosis (Mtb), which are believed to contribute to the bacterium's ability to infect human beings. To generate predictions, we analyzed patterns of coevolution in a small subset of likely interacting proteins, and extended these patterns to predict additional interactions throughout the genome. Our results provide a map for experimental probes of the Mtb interaction network, for the benefit of drug and vaccine discovery. More generally, our procedure is applicable to detecting interactions of proteins that belong to large families of paralogs in any organism with a sequenced genome.Keywords
This publication has 52 references indexed in Scilit:
- Multidimensional annotation of the Escherichia coli K-12 genomeNucleic Acids Research, 2007
- A versatile ligation-independent cloning method suitable for high-throughput expression screening applicationsNucleic Acids Research, 2007
- NCBI GEO: mining tens of millions of expression profiles--database and tools updateNucleic Acids Research, 2006
- Toward the structural genomics of complexes: Crystal structure of a PE/PPE protein complex from Mycobacterium tuberculosisProceedings of the National Academy of Sciences, 2006
- Structure and function of the complex formed by the tuberculosis virulence factors CFP-10 and ESAT-6The EMBO Journal, 2005
- Evolutionarily conserved networks of residues mediate allosteric communication in proteinsNature Structural & Molecular Biology, 2002
- Mapping pathways of allosteric communication in GroEL by analysis of correlated mutationsProteins-Structure Function and Bioinformatics, 2002
- Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequenceNature, 1998
- CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choiceNucleic Acids Research, 1994
- Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical featuresPeptide Science, 1983