Comparison of co-expression measures: mutual information, correlation, and model based indices
Open Access
- 9 December 2012
- journal article
- research article
- Published by Springer Science and Business Media LLC in BMC Bioinformatics
- Vol. 13 (1), 328
- https://doi.org/10.1186/1471-2105-13-328
Abstract
Co-expression measures are often used to define networks among genes. Mutual information (MI) is often used as a generalized correlation measure. It is not clear how much MI adds beyond standard (robust) correlation measures or regression model based association measures. Further, it is important to assess what transformations of these and other co-expression measures lead to biologically meaningful modules (clusters of genes). We provide a comprehensive comparison between mutual information and several correlation measures in 8 empirical data sets and in simulations. We also study different approaches for transforming an adjacency matrix, e.g. using the topological overlap measure. Overall, we confirm close relationships between MI and correlation in all data sets which reflects the fact that most gene pairs satisfy linear or monotonic relationships. We discuss rare situations when the two measures disagree. We also compare correlation and MI based approaches when it comes to defining co-expression network modules. We show that a robust measure of correlation (the biweight midcorrelation transformed via the topological overlap transformation) leads to modules that are superior to MI based modules and maximal information coefficient (MIC) based modules in terms of gene ontology enrichment. We present a function that relates correlation to mutual information which can be used to approximate the mutual information from the corresponding correlation coefficient. We propose the use of polynomial or spline regression models as an alternative to MI for capturing non-linear relationships between quantitative variables. The biweight midcorrelation outperforms MI in terms of elucidating gene pairwise relationships. Coupled with the topological overlap matrix transformation, it often leads to more significantly enriched co-expression modules. Spline and polynomial networks form attractive alternatives to MI in case of non-linear relationships. Our results indicate that MI networks can safely be replaced by correlation networks when it comes to measuring co-expression relationships in stationary data.Keywords
This publication has 74 references indexed in Scilit:
- Functional organization of the transcriptome in human brainNature Neuroscience, 2008
- Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytesNature Genetics, 2007
- Weighted gene coexpression network analysis strategies applied to mouse weightMammalian Genome, 2007
- How to infer gene networks from expression profilesMolecular Systems Biology, 2007
- Analysis of oncogenic signaling networks in glioblastoma identifies ASPM as a molecular targetProceedings of the National Academy of Sciences of the United States of America, 2006
- Relationships between probabilistic Boolean networks and dynamic Bayesian networks as models of gene regulatory networksSignal Processing, 2006
- An integrative genomics approach to infer causal associations between gene expression and diseaseNature Genetics, 2005
- Reverse engineering of regulatory networks in human B cellsNature Genetics, 2005
- Transitive functional annotation by shortest-path analysis of gene expression dataProceedings of the National Academy of Sciences of the United States of America, 2002
- Using Bayesian Networks to Analyze Expression DataJournal of Computational Biology, 2000