NOISE-ROBUST SOFT CLUSTERING OF GENE EXPRESSION TIME-COURSE DATA

Top Cited Papers

1 August 2005

journal article
research article
Published by World Scientific Pub Co Pte Ltd in Journal of Bioinformatics and Computational Biology

Vol. 03 (04), 965-988
https://doi.org/10.1142/s0219720005001375

Abstract

Clustering is an important tool in microarray data analysis. This unsupervised learning technique is commonly used to reveal structures hidden in large gene expression data sets. The vast majority of clustering algorithms applied so far produce hard partitions of the data, i.e. each gene is assigned exactly to one cluster. Hard clustering is favourable if clusters are well separated. However, this is generally not the case for microarray time-course data, where gene clusters frequently overlap. Additionally, hard clustering algorithms are often highly sensitive to noise. To overcome the limitations of hard clustering, we applied soft clustering which offers several advantages for researchers. First, it generates accessible internal cluster structures, i.e. it indicates how well corresponding clusters represent genes. This can be used for the more targeted search for regulatory elements. Second, the overall relation between clusters, and thus a global clustering structure, can be defined. Additionally, soft clustering is more noise robust and a priori pre-filtering of genes can be avoided. This prevents the exclusion of biologically relevant genes from the data analysis. Soft clustering was implemented here using the fuzzy c-means algorithm. Procedures to find optimal clustering parameters were developed. A software package for soft clustering has been developed based on the open-source statistical language R. The package called Mfuzz is freely available.

Keywords

This publication has 16 references indexed in Scilit:

Fuzzy C-means method for clustering microarray data
Bioinformatics, 2003
Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering
Genome Biology, 2002
Missing value estimation methods for DNA microarrays
Bioinformatics, 2001
Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters
Bioinformatics, 2001
Functional Discovery via a Compendium of Expression Profiles
Cell, 2000
Computational identification of Cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae
Journal of Molecular Biology, 2000
Comprehensive Identification of Cell Cycle–regulated Genes of the YeastSaccharomyces cerevisiaeby Microarray Hybridization
Molecular Biology of the Cell, 1998
The Transcriptional Program of Sporulation in Budding Yeast
Science, 1998
Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation
Nature Biotechnology, 1998
Unsupervised optimal fuzzy clustering
Ieee Transactions On Pattern Analysis and Machine Intelligence, 1989

Cited by 411 articles