“Guilt by Association” Is the Exception Rather Than the Rule in Gene Networks

Open Access

29 March 2012

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 8 (3), e1002444
https://doi.org/10.1371/journal.pcbi.1002444

Abstract

Gene networks are commonly interpreted as encoding functional information in their connections. An extensively validated principle called guilt by association states that genes which are associated or interacting are more likely to share function. Guilt by association provides the central top-down principle for analyzing gene networks in functional terms or assessing their quality in encoding functional information. In this work, we show that functional information within gene networks is typically concentrated in only a very few interactions whose properties cannot be reliably related to the rest of the network. In effect, the apparent encoding of function within networks has been largely driven by outliers whose behaviour cannot even be generalized to individual genes, let alone to the network at large. While experimentalist-driven analysis of interactions may use prior expert knowledge to focus on the small fraction of critically important data, large-scale computational analyses have typically assumed that high-performance cross-validation in a network is due to a generalizable encoding of function. Because we find that gene function is not systemically encoded in networks, but dependent on specific and critical interactions, we conclude it is necessary to focus on the details of how networks encode function and what information computational analyses use to extract functional meaning. We explore a number of consequences of this and find that network structure itself provides clues as to which connections are critical and that systemic properties, such as scale-free-like behaviour, do not map onto the functional connectivity within networks. The analysis of gene function and gene networks is a major theme of post-genome biomedical research. Historically, many attempts to understand gene function leverage a biological principle known as “guilt by association” (GBA). GBA states that genes with related functions tend to share properties such as genetic or physical interactions. In the past ten years, GBA has been scaled up for application to large gene networks, becoming a favored way to grapple with the complex interdependencies of gene functions in the face of floods of genomics and proteomics data. However, there is a growing realization that scaled-up GBA is not a panacea. In this study, we report a precise identification of the limits of GBA and show that it cannot provide a way to understand gene networks in a way that is simultaneously general and useful. Our findings indicate that the assumptions underlying the high-throughput use of gene networks to interpret function are fundamentally flawed, with wide-ranging implications for the interpretation of genome-wide data.

Keywords

This publication has 67 references indexed in Scilit:

New Nanostructured Carbon Coating Inhibits Bacterial Growth, but Does Not Influence on Animal Cells
Nanomaterials, 2020
Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana
Nature Biotechnology, 2010
Empirically controlled mapping of the Caenorhabditis elegans protein-protein interactome network
Nature Methods, 2008
Decoding genes with coexpression networks and metabolomics – ‘majority report by precogs’
Trends in Plant Science, 2008
InnateDB: facilitating systems‐level analyses of the mammalian innate immune response
Molecular Systems Biology, 2008
The incoherent feed‐forward loop can generate non‐monotonic input functions for genes
Molecular Systems Biology, 2008
Gene prioritization through genomic data fusion
Nature Biotechnology, 2006
Global protein function prediction from protein-protein interaction networks
Nature Biotechnology, 2003
Transitive functional annotation by shortest-path analysis of gene expression data
Proceedings of the National Academy of Sciences of the United States of America, 2002
Comparative assessment of large-scale data sets of protein–protein interactions
Nature, 2002

Cited by 178 articles