An objective evaluation criterion for clustering
- 22 August 2004
- conference paper
- conference paper
- Published by Association for Computing Machinery (ACM)
- p. 515-520
- https://doi.org/10.1145/1014052.1014112
Abstract
We propose and test an objective criterion for evaluation of clustering performance: How well does a clustering algorithm run on unlabeled data aid a classification algorithm? The accuracy is quantified using the PAC-MDL bound [3] in a semisupervised setting. Clustering algorithms which naturally separate the data according to (hidden) labels with a small number of clusters perform well. A simple extension of the argument leads to an objective model selection method. Experimental results on text analysis datasets demonstrate that this approach empirically results in very competitive bounds on test set performance on natural datasets.Keywords
This publication has 3 references indexed in Scilit:
- Frequency-Sensitive Competitive Learning for Scalable Balanced Clustering on High-Dimensional HyperspheresIEEE Transactions on Neural Networks, 2004
- Generative model-based clustering of directional dataPublished by Association for Computing Machinery (ACM) ,2003
- Concept Decompositions for Large Sparse Text Data Using ClusteringMachine Learning, 2001