Judging the Quality of Gene Expression-Based Clustering Methods Using Gene Annotation

Open Access

1 October 2002

journal article
research article
Published by Cold Spring Harbor Laboratory in Genome Research

Vol. 12 (10), 1574-1581
https://doi.org/10.1101/gr.397002

Abstract

We compare several commonly used expression-based gene clustering algorithms using a figure of merit based on the mutual information between cluster membership and known gene attributes. By studying various publicly available expression data sets we conclude that enrichment of clusters for biological function is, in general, highest at rather low cluster numbers. As a measure of dissimilarity between the expression patterns of two genes, no method outperforms Euclidean distance for ratio-based measurements, or Pearson distance for non-ratio-based measurements at the optimal choice of cluster number. We show the self-organized-map approach to be best for both measurement types at higher numbers of clusters. Clusters of genes derived from single- and average-linkage hierarchical clustering tend to produce worse-than-random results.[The algorithm described is available at http://llama.med.harvard.edu, under Software.]

Keywords

This publication has 32 references indexed in Scilit:

Discrimination between Paralogs using Microarray Analysis: Application to the Yap1p and Yap2p Transcriptional Networks
Molecular Biology of the Cell, 2002
Clustering Based on Conditional Distributions in an Auxiliary Space
Neural Computation, 2002
Computational analysis of microarray data
Nature Reviews Genetics, 2001
Assessing Clusters and Motifs from Gene Expression Data
Genome Research, 2001
Regulatory Networks Revealed by Transcriptional Profiling of DamagedSaccharomyces cerevisiaeCells: Rpn4 Links Base Excision Repair with Proteasomes
Molecular and Cellular Biology, 2000
Coupled two-way clustering analysis of gene microarray data
Proceedings of the National Academy of Sciences of the United States of America, 2000
Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation
Nature Biotechnology, 1998
How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis
The Computer Journal, 1998
Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale
Science, 1997
Cluster analysis
Quality & Quantity, 1980

Cited by 231 articles