Dynamically weighted clustering with noise set

Open Access

9 December 2009

journal article
research article
Published by Oxford University Press (OUP) in Bioinformatics

Vol. 26 (3), 341-347
https://doi.org/10.1093/bioinformatics/btp671

Abstract

Motivation: Various clustering methods have been applied to microarray gene expression data for identifying genes with similar expression profiles. As the biological annotation data accumulated, more and more genes have been organized into functional categories. Functionally related genes may be regulated by common cellular signals, thus likely to be co-expressed. Consequently, utilizing the rapidly increasing functional annotation resources such as Gene Ontology (GO) to improve the performance of clustering methods is of great interest. On the opposite side of clustering, there are genes that have distinct expression profiles and do not co-express with other genes. Identification of these scattered genes could enhance the performance of clustering methods. Results: We developed a new clustering algorithm, Dynamically Weighted Clustering with Noise set (DWCN), which makes use of gene annotation information and allows for a set of scattered genes, the noise set, to be left out of the main clusters. We tested the DWCN method and contrasted its results with those obtained using several common clustering techniques on a simulated dataset as well as on two public datasets: the Stanford yeast cell-cycle gene expression data, and a gene expression dataset for a group of genetically different yeast segregants. Conclusion: Our method produces clusters with more consistent functional annotations and more coherent expression patterns than existing clustering techniques. Contact: yshen@stat.ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Keywords

This publication has 37 references indexed in Scilit:

Patterns of co-expression for protein complexes by size in Saccharomyces cerevisiae
Nucleic Acids Research, 2008
Discovering multi–level structures in bio-molecular data through the Bernstein inequality
BMC Bioinformatics, 2008
Penalized and weightedK-means for clustering with scattered objects and prior information in high-throughput biological data
Bioinformatics, 2007
Model order selection for bio-molecular data clustering
BMC Bioinformatics, 2007
Cluster Validation by Prediction Strength
Journal of Computational and Graphical Statistics, 2005
Tight Clustering: A Resampling‐Based Approach for Identifying Stable and Tight Patterns in Data
Biometrics, 2005
Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data
Nature Genetics, 2003
The random subspace method for constructing decision forests
Ieee Transactions On Pattern Analysis and Machine Intelligence, 1998
Comparing partitions
Journal of Classification, 1985
Objective Criteria for the Evaluation of Clustering Methods
Journal of the American Statistical Association, 1971

Cited by 12 articles