Associating Genes and Protein Complexes with Disease via Network Propagation

Open Access

15 January 2010

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 6 (1), e1000641
https://doi.org/10.1371/journal.pcbi.1000641

Abstract

A fundamental challenge in human health is the identification of disease-causing genes. Recently, several studies have tackled this challenge via a network-based approach, motivated by the observation that genes causing the same or similar diseases tend to lie close to one another in a network of protein-protein or functional interactions. However, most of these approaches use only local network information in the inference process and are restricted to inferring single gene associations. Here, we provide a global, network-based method for prioritizing disease genes and inferring protein complex associations, which we call PRINCE. The method is based on formulating constraints on the prioritization function that relate to its smoothness over the network and usage of prior information. We exploit this function to predict not only genes but also protein complex associations with a disease of interest. We test our method on gene-disease association data, evaluating both the prioritization achieved and the protein complexes inferred. We show that our method outperforms extant approaches in both tasks. Using data on 1,369 diseases from the OMIM knowledgebase, our method is able (in a cross validation setting) to rank the true causal gene first for 34% of the diseases, and infer 139 disease-related complexes that are highly coherent in terms of the function, expression and conservation of their member proteins. Importantly, we apply our method to study three multi-factorial diseases for which some causal genes have been found already: prostate cancer, alzheimer and type 2 diabetes mellitus. PRINCE's predictions for these diseases highly match the known literature, suggesting several novel causal genes and protein complexes for further investigation. Understanding the genetic background of diseases is crucial to medical research, with implications in diagnosis, treatment and drug development. As molecular approaches to this challenge are time consuming and costly, computational approaches offer an efficient alternative. Such approaches aim at prioritizing genes in a genomic interval of interest according to their predicted strength-of-association with a given disease. State-of-the-art prioritization problems are based on the observation that genes causing similar diseases tend to lie close to one another in a network of protein-protein interactions. Here we develop a novel prioritization approach that uses the network data in a global manner and can tie not only single genes but also whole protein machineries with a given disease. Our method, PRINCE, is shown to outperform previous methods in both the gene prioritization task and the protein complex task. Applying PRINCE to prostate cancer, alzheimer's disease and type 2 diabetes, we are able to infer new causal genes and related protein complexes with high confidence.

Keywords

This publication has 30 references indexed in Scilit:

Walking the Interactome for Prioritization of Candidate Disease Genes
American Journal of Human Genetics, 2008
Network‐based global inference of human disease genes
Molecular Systems Biology, 2008
Transcriptional regulation of protein complexes within and across species
Proceedings of the National Academy of Sciences of the United States of America, 2007
Large‐scale mapping of human protein–protein interactions by mass spectrometry
Molecular Systems Biology, 2007
Modeling cellular machinery through biological network comparison
Nature Biotechnology, 2006
A text-mining analysis of the human phenome
European Journal of Human Genetics, 2006
Towards a proteome-scale map of the human protein–protein interaction network
Nature, 2005
Conserved patterns of protein interaction in multiple species
Proceedings of the National Academy of Sciences of the United States of America, 2005
From syndrome families to functional genomics
Nature Reviews Genetics, 2004
The Genetic Association Database
Nature Genetics, 2004

Cited by 720 articles