“Guilt by Association” Is the Exception Rather Than the Rule in Gene Networks
Open Access
- 29 March 2012
- journal article
- research article
- Published by Public Library of Science (PLoS) in PLoS Computational Biology
- Vol. 8 (3), e1002444
- https://doi.org/10.1371/journal.pcbi.1002444
Abstract
Gene networks are commonly interpreted as encoding functional information in their connections. An extensively validated principle called guilt by association states that genes which are associated or interacting are more likely to share function. Guilt by association provides the central top-down principle for analyzing gene networks in functional terms or assessing their quality in encoding functional information. In this work, we show that functional information within gene networks is typically concentrated in only a very few interactions whose properties cannot be reliably related to the rest of the network. In effect, the apparent encoding of function within networks has been largely driven by outliers whose behaviour cannot even be generalized to individual genes, let alone to the network at large. While experimentalist-driven analysis of interactions may use prior expert knowledge to focus on the small fraction of critically important data, large-scale computational analyses have typically assumed that high-performance cross-validation in a network is due to a generalizable encoding of function. Because we find that gene function is not systemically encoded in networks, but dependent on specific and critical interactions, we conclude it is necessary to focus on the details of how networks encode function and what information computational analyses use to extract functional meaning. We explore a number of consequences of this and find that network structure itself provides clues as to which connections are critical and that systemic properties, such as scale-free-like behaviour, do not map onto the functional connectivity within networks. The analysis of gene function and gene networks is a major theme of post-genome biomedical research. Historically, many attempts to understand gene function leverage a biological principle known as “guilt by association” (GBA). GBA states that genes with related functions tend to share properties such as genetic or physical interactions. In the past ten years, GBA has been scaled up for application to large gene networks, becoming a favored way to grapple with the complex interdependencies of gene functions in the face of floods of genomics and proteomics data. However, there is a growing realization that scaled-up GBA is not a panacea. In this study, we report a precise identification of the limits of GBA and show that it cannot provide a way to understand gene networks in a way that is simultaneously general and useful. Our findings indicate that the assumptions underlying the high-throughput use of gene networks to interpret function are fundamentally flawed, with wide-ranging implications for the interpretation of genome-wide data.Keywords
This publication has 67 references indexed in Scilit:
- New Nanostructured Carbon Coating Inhibits Bacterial Growth, but Does Not Influence on Animal CellsNanomaterials, 2020
- Rational association of genes with traits using a genome-scale gene network for Arabidopsis thalianaNature Biotechnology, 2010
- Empirically controlled mapping of the Caenorhabditis elegans protein-protein interactome networkNature Methods, 2008
- Decoding genes with coexpression networks and metabolomics – ‘majority report by precogs’Trends in Plant Science, 2008
- InnateDB: facilitating systems‐level analyses of the mammalian innate immune responseMolecular Systems Biology, 2008
- The incoherent feed‐forward loop can generate non‐monotonic input functions for genesMolecular Systems Biology, 2008
- Gene prioritization through genomic data fusionNature Biotechnology, 2006
- Global protein function prediction from protein-protein interaction networksNature Biotechnology, 2003
- Transitive functional annotation by shortest-path analysis of gene expression dataProceedings of the National Academy of Sciences of the United States of America, 2002
- Comparative assessment of large-scale data sets of protein–protein interactionsNature, 2002