A prediction-based resampling method for estimating the number of clusters in a dataset
Open Access
- 25 June 2002
- journal article
- research article
- Published by Springer Science and Business Media LLC in Genome Biology
Abstract
Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems, such as the classification of tumors. An important statistical problem associated with tumor classification is the identification of new tumor classes using gene-expression profiles. Two essential aspects of this clustering problem are: to estimate the number of clusters, if any, in a dataset; and to allocate tumor samples to these clusters, and assess the confidence of cluster assignments for individual samples. Here we address the first of these problems. We have developed a new prediction-based resampling method, Clest, to estimate the number of clusters in a dataset. The performance of the new and existing methods were compared using simulated data and gene-expression data from four recently published cancer microarray studies. Clest was generally found to be more accurate and robust than the six existing methods considered in the study. Focusing on prediction accuracy in conjunction with resampling produces accurate and robust estimates of the number of clusters.Keywords
This publication has 32 references indexed in Scilit:
- Comparison of Methods for Image Analysis on cDNA Microarray DataJournal of Computational and Graphical Statistics, 2002
- Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression DataJournal of the American Statistical Association, 2002
- Systematic variation in gene expression patterns in human cancer cell linesNature Genetics, 2000
- Distinct types of diffuse large B-cell lymphoma identified by gene expression profilingNature, 2000
- A Decision-Theoretic Generalization of On-Line Learning and an Application to BoostingJournal of Computer and System Sciences, 1997
- Statistical theory in clusteringJournal of Classification, 1985
- On some significance tests in cluster analysisJournal of Classification, 1985
- A Method for Comparing Two Hierarchical ClusteringsJournal of the American Statistical Association, 1983
- Editorial boardCommunications in Statistics - Theory and Methods, 1974
- Objective Criteria for the Evaluation of Clustering MethodsJournal of the American Statistical Association, 1971