An objective evaluation criterion for clustering

22 August 2004

conference paper
conference paper
Published by Association for Computing Machinery (ACM)

p. 515-520
https://doi.org/10.1145/1014052.1014112

Abstract

We propose and test an objective criterion for evaluation of clustering performance: How well does a clustering algorithm run on unlabeled data aid a classification algorithm? The accuracy is quantified using the PAC-MDL bound [3] in a semisupervised setting. Clustering algorithms which naturally separate the data according to (hidden) labels with a small number of clusters perform well. A simple extension of the argument leads to an objective model selection method. Experimental results on text analysis datasets demonstrate that this approach empirically results in very competitive bounds on test set performance on natural datasets.

Keywords

This publication has 3 references indexed in Scilit:

Frequency-Sensitive Competitive Learning for Scalable Balanced Clustering on High-Dimensional Hyperspheres
IEEE Transactions on Neural Networks, 2004
Generative model-based clustering of directional data
Published by Association for Computing Machinery (ACM) ,2003
Concept Decompositions for Large Sparse Text Data Using Clustering
Machine Learning, 2001

Cited by 14 articles