Is My Network Module Preserved and Reproducible?

Open Access

20 January 2011

journal article
research article
Published by Public Library of Science (PLoS) in PLoS Computational Biology

Vol. 7 (1), e1001057
https://doi.org/10.1371/journal.pcbi.1001057

Abstract

In many applications, one is interested in determining which of the properties of a network module change across conditions. For example, to validate the existence of a module, it is desirable to show that it is reproducible (or preserved) in an independent test network. Here we study several types of network preservation statistics that do not require a module assignment in the test network. We distinguish network preservation statistics by the type of the underlying network. Some preservation statistics are defined for a general network (defined by an adjacency matrix) while others are only defined for a correlation network (constructed on the basis of pairwise correlations between numeric variables). Our applications show that the correlation structure facilitates the definition of particularly powerful module preservation statistics. We illustrate that evaluating module preservation is in general different from evaluating cluster preservation. We find that it is advantageous to aggregate multiple preservation statistics into summary preservation statistics. We illustrate the use of these methods in six gene co-expression network applications including 1) preservation of cholesterol biosynthesis pathway in mouse tissues, 2) comparison of human and chimpanzee brain networks, 3) preservation of selected KEGG pathways between human and chimpanzee brain networks, 4) sex differences in human cortical networks, 5) sex differences in mouse liver networks. While we find no evidence for sex specific modules in human cortical networks, we find that several human cortical modules are less preserved in chimpanzees. In particular, apoptosis genes are differentially co-expressed between humans and chimpanzees. Our simulation studies and applications show that module preservation statistics are useful for studying differences between the modular structure of networks. Data, R software and accompanying tutorials can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/ModulePreservation. In network applications, one is often interested in studying whether modules are preserved across multiple networks. For example, to determine whether a pathway of genes is perturbed in a certain condition, one can study whether its connectivity pattern is no longer preserved. Non-preserved modules can either be biologically uninteresting (e.g., reflecting data outliers) or interesting (e.g., reflecting sex specific modules). An intuitive approach for studying module preservation is to cross-tabulate module membership. But this approach often cannot address questions about the preservation of connectivity patterns between nodes. Thus, cross-tabulation based approaches often fail to recognize that important aspects of a network module are preserved. Cross-tabulation methods make it difficult to argue that a module is not preserved. The weak statement (“the reference module does not overlap with any of the identified test set modules”) is less relevant in practice than the strong statement (“the module cannot be found in the test network irrespective of the parameter settings of the module detection procedure”). Module preservation statistics have important applications, e.g. we show that the wiring of apoptosis genes in a human cortical network differs from that in chimpanzees.

Keywords

This publication has 58 references indexed in Scilit:

Divergence of human and mouse brain transcriptome highlights Alzheimer disease pathways
Proceedings of the National Academy of Sciences of the United States of America, 2010
MAP'ing CNS Development and Cognition: An ERKsome Process
Neuron, 2009
Functional organization of the transcriptome in human brain
Nature Neuroscience, 2008
A pattern recognition approach to infer time-lagged genetic interactions
Bioinformatics, 2008
Weighted gene coexpression network analysis strategies applied to mouse weight
Mammalian Genome, 2007
Conservation and evolution of gene coexpression networks in human and chimpanzee brains
Proceedings of the National Academy of Sciences of the United States of America, 2006
Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular target
Proceedings of the National Academy of Sciences of the United States of America, 2006
Identification of inflammatory gene modules based on variations of human endothelial cell responses to oxidized lipids
Proceedings of the National Academy of Sciences of the United States of America, 2006
Transitive functional annotation by shortest-path analysis of gene expression data
Proceedings of the National Academy of Sciences of the United States of America, 2002
KEGG: Kyoto Encyclopedia of Genes and Genomes
Nucleic Acids Research, 2000

Cited by 811 articles