Identifying Cognate Binding Pairs among a Large Set of Paralogs: The Case of PE/PPE Proteins of Mycobacterium tuberculosis

Open Access

12 September 2008

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 4 (9), e1000174
https://doi.org/10.1371/journal.pcbi.1000174

Abstract

We consider the problem of how to detect cognate pairs of proteins that bind when each belongs to a large family of paralogs. To illustrate the problem, we have undertaken a genomewide analysis of interactions of members of the PE and PPE protein families of Mycobacterium tuberculosis. Our computational method uses structural information, operon organization, and protein coevolution to infer the interaction of PE and PPE proteins. Some 289 PE/PPE complexes were predicted out of a possible 5,590 PE/PPE pairs genomewide. Thirty-five of these predicted complexes were also found to have correlated mRNA expression, providing additional evidence for these interactions. We show that our method is applicable to other protein families, by analyzing interactions of the Esx family of proteins. Our resulting set of predictions is a starting point for genomewide experimental interaction screens of the PE and PPE families, and our method may be generally useful for detecting interactions of proteins within families having many paralogs. We consider the problem of detecting protein interactions from genome sequences when the potential interacting partners belong to large families of similar (homologous) proteins. Many computational methods for predicting protein interactions rely on similarity to a pair of known interacting proteins. When the proteins in question are members of large groups of similar proteins within the same organism (paralogs), the problem of inferring the correct interactions becomes difficult. To illustrate the problem, we undertook prediction of interactions of some highly expanded protein families of Mycobacterium tuberculosis (Mtb), which are believed to contribute to the bacterium's ability to infect human beings. To generate predictions, we analyzed patterns of coevolution in a small subset of likely interacting proteins, and extended these patterns to predict additional interactions throughout the genome. Our results provide a map for experimental probes of the Mtb interaction network, for the benefit of drug and vaccine discovery. More generally, our procedure is applicable to detecting interactions of proteins that belong to large families of paralogs in any organism with a sequenced genome.

Keywords

This publication has 52 references indexed in Scilit:

Multidimensional annotation of the Escherichia coli K-12 genome
Nucleic Acids Research, 2007
A versatile ligation-independent cloning method suitable for high-throughput expression screening applications
Nucleic Acids Research, 2007
NCBI GEO: mining tens of millions of expression profiles--database and tools update
Nucleic Acids Research, 2006
Toward the structural genomics of complexes: Crystal structure of a PE/PPE protein complex from Mycobacterium tuberculosis
Proceedings of the National Academy of Sciences, 2006
Structure and function of the complex formed by the tuberculosis virulence factors CFP-10 and ESAT-6
The EMBO Journal, 2005
Evolutionarily conserved networks of residues mediate allosteric communication in proteins
Nature Structural & Molecular Biology, 2002
Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations
Proteins-Structure Function and Bioinformatics, 2002
Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence
Nature, 1998
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice
Nucleic Acids Research, 1994
Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features
Peptide Science, 1983

Cited by 35 articles