Comparison of Clustering Methods for Investigation of Genome-Wide Methylation Array Data
Open Access
- 1 January 2011
- journal article
- Published by Frontiers Media SA in Frontiers in Genetics
- Vol. 2, 88
- https://doi.org/10.3389/fgene.2011.00088
Abstract
The use of genome-wide methylation arrays has proved very informative to investigate both clinical and biological questions in human epigenomics. The use of clustering methods either for exploration of these data or to compare to an a priori grouping, e.g., normal versus disease allows assessment of groupings of data without user bias. However no consensus on the methods to use for clustering of methylation array approaches has been reached. To determine the most appropriate clustering method for analysis of illumina array methylation data, a collection of data sets was simulated and used to compare clustering methods. Both hierarchical clustering and non-hierarchical clustering methods (k-means, k-medoids, and fuzzy clustering algorithms) were compared using a range of distance and linkage methods. As no single method consistently outperformed others across different simulations, we propose a method to capture the best clustering outcome based on an additional measure, the silhouette width. This approach produced a consistently higher cluster accuracy compared to using any one method in isolation.Keywords
This publication has 18 references indexed in Scilit:
- Genome-wide Methylation Analysis Identifies Genes Specific to Breast Cancer Hormone Receptor Status and Risk of RecurrenceCancer Research, 2011
- Infant growth restriction is associated with distinct patterns of DNA methylation in human placentasEpigenetics, 2011
- Quantitative, high-resolution epigenetic profiling of CpG loci identifies associations with cord blood plasma homocysteine and birth weight in humansEpigenetics, 2011
- Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysisBMC Bioinformatics, 2010
- A statistical framework for Illumina DNA methylation arraysBioinformatics, 2010
- Model-based clustering of DNA methylation array data: a recursive-partitioning algorithm for high-dimensional data arising as a mixture of beta distributionsBMC Bioinformatics, 2008
- Cluster analysis for DNA methylation profiles having a detection thresholdBMC Bioinformatics, 2006
- Whole-genome genotyping with the single-base extension assayNature Methods, 2005
- MethyLight: a high-throughput assay to measure DNA methylationNucleic Acids Research, 2000
- Genomic instability: First step to carcinogenesis1999