Network methods for describing sample relationships in genomic datasets: application to Huntington’s disease

Open Access

1 January 2012

journal article
research article
Published by Springer Science and Business Media LLC in BMC Systems Biology

Vol. 6 (1), 63
https://doi.org/10.1186/1752-0509-6-63

Abstract

Genomic datasets generated by new technologies are increasingly prevalent in disparate areas of biological research. While many studies have sought to characterize relationships among genomic features, commensurate efforts to characterize relationships among biological samples have been less common. Consequently, the full extent of sample variation in genomic studies is often under-appreciated, complicating downstream analytical tasks such as gene co-expression network analysis. Here we demonstrate the use of network methods for characterizing sample relationships in microarray data generated from human brain tissue. We describe an approach for identifying outlying samples that does not depend on the choice or use of clustering algorithms. We introduce a battery of measures for quantifying the consistency and integrity of sample relationships, which can be compared across disparate studies, technology platforms, and biological systems. Among these measures, we provide evidence that the correlation between the connectivity and the clustering coefficient (two important network concepts) is a sensitive indicator of homogeneity among biological samples. We also show that this measure, which we refer to as cor(K,C), can distinguish biologically meaningful relationships among subgroups of samples. Specifically, we find that cor(K,C) reveals the profound effect of Huntington’s disease on samples from the caudate nucleus relative to other brain regions. Furthermore, we find that this effect is concentrated in specific modules of genes that are naturally co-expressed in human caudate nucleus, highlighting a new strategy for exploring the effects of disease on sets of genes. These results underscore the importance of systematically exploring sample relationships in large genomic datasets before seeking to analyze genomic feature activity. We introduce a standardized platform for this purpose using freely available R software that has been designed to enable iterative and interactive exploration of sample networks.

Keywords

This publication has 51 references indexed in Scilit:

A comparative study of cancer proteins in the human protein-protein interaction network
BMC Genomics, 2010
Overview on Techniques in Cluster Analysis
Published by Springer Science and Business Media LLC ,2009
Automated multidimensional phenotypic profiling using large public microarray repositories
Proceedings of the National Academy of Sciences of the United States of America, 2009
Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells
BMC Genomics, 2009
WGCNA: an R package for weighted correlation network analysis
BMC Bioinformatics, 2008
Functional organization of the transcriptome in human brain
Nature Neuroscience, 2008
Transcriptional signatures in Huntington's disease
Progress in Neurobiology, 2007
Conservation and evolution of gene coexpression networks in human and chimpanzee brains
Proceedings of the National Academy of Sciences of the United States of America, 2006
Gene expression analyses reveal molecular relationships among 20 regions of the human CNS
neurogenetics, 2006
Network motifs in the transcriptional regulation network of Escherichia coli
Nature Genetics, 2002

Cited by 166 articles