Analysis of domain correlations in yeast protein complexes

Abstract
Motivation: A growing body of research has concentrated on the identification and definition of conserved sequence motifs. It is widely recognized that these conserved sequence and structural units often mediate protein functions and interactions. The continuing advancements in high-throughput experiments necessitate the development of computational methods to critically assess the results. In this work, we analyzed high-throughput protein complexes using the domain composition of their protein constituents. Domains that mediate similar or related functions may consistently co-occur in protein complexes. Results: We analyzed Saccharomyces cerevisiae protein complexes from curated and high-throughput experimental datasets to identify statistically significant functional associations between domains. The resulting correlations are represented as domain networks that form the basis of comparison between the datasets, as well as to binary protein interactions. The results show that the curated datasets produce domain networks that map to known biological assemblies, such as ribosome, RNA polymerase, proteasome regulators, transcription initiation and histones. Furthermore, many of these domain correlations were also found in binary protein interactions. In contrast, the high-throughput datasets contain one large network of domain associations. High connectivity of RNA processing and binding domains in the high-throughput datasets reflects the abundance of RNA binding proteins in yeast, in agreement with a previous report that identified a nucleolar protein cluster, possibly mediated by rRNA, from these complexes. Availability: The software is available upon request from the authors and is dependent on the NCBI C++ toolkit.