Is My Network Module Preserved and Reproducible?

Abstract
In many applications, one is interested in determining which of the properties of a network module change across conditions. For example, to validate the existence of a module, it is desirable to show that it is reproducible (or preserved) in an independent test network. Here we study several types of network preservation statistics that do not require a module assignment in the test network. We distinguish network preservation statistics by the type of the underlying network. Some preservation statistics are defined for a general network (defined by an adjacency matrix) while others are only defined for a correlation network (constructed on the basis of pairwise correlations between numeric variables). Our applications show that the correlation structure facilitates the definition of particularly powerful module preservation statistics. We illustrate that evaluating module preservation is in general different from evaluating cluster preservation. We find that it is advantageous to aggregate multiple preservation statistics into summary preservation statistics. We illustrate the use of these methods in six gene co-expression network applications including 1) preservation of cholesterol biosynthesis pathway in mouse tissues, 2) comparison of human and chimpanzee brain networks, 3) preservation of selected KEGG pathways between human and chimpanzee brain networks, 4) sex differences in human cortical networks, 5) sex differences in mouse liver networks. While we find no evidence for sex specific modules in human cortical networks, we find that several human cortical modules are less preserved in chimpanzees. In particular, apoptosis genes are differentially co-expressed between humans and chimpanzees. Our simulation studies and applications show that module preservation statistics are useful for studying differences between the modular structure of networks. Data, R software and accompanying tutorials can be downloaded from the following webpage: http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/ModulePreservation. In network applications, one is often interested in studying whether modules are preserved across multiple networks. For example, to determine whether a pathway of genes is perturbed in a certain condition, one can study whether its connectivity pattern is no longer preserved. Non-preserved modules can either be biologically uninteresting (e.g., reflecting data outliers) or interesting (e.g., reflecting sex specific modules). An intuitive approach for studying module preservation is to cross-tabulate module membership. But this approach often cannot address questions about the preservation of connectivity patterns between nodes. Thus, cross-tabulation based approaches often fail to recognize that important aspects of a network module are preserved. Cross-tabulation methods make it difficult to argue that a module is not preserved. The weak statement (“the reference module does not overlap with any of the identified test set modules”) is less relevant in practice than the strong statement (“the module cannot be found in the test network irrespective of the parameter settings of the module detection procedure”). Module preservation statistics have important applications, e.g. we show that the wiring of apoptosis genes in a human cortical network differs from that in chimpanzees.

This publication has 58 references indexed in Scilit: