An efficient k-means clustering algorithm: analysis and implementation

Top Cited Papers

Abstract

In k-means clustering, we are given a set of n data points in d-dimensional space R/sup d/ and an integer k and the problem is to determine a set of k points in Rd, called centers, so as to minimize the mean squared distance from each data point to its nearest center. A popular heuristic for k-means clustering is Lloyd's (1982) algorithm. We present a simple and efficient implementation of Lloyd's k-means clustering algorithm, which we call the filtering algorithm. This algorithm is easy to implement, requiring a kd-tree as the only major data structure. We establish the practical efficiency of the filtering algorithm in two ways. First, we present a data-sensitive analysis of the algorithm's running time, which shows that the algorithm runs faster as the separation between clusters increases. Second, we present a number of empirical studies both on synthetically generated data and on real data sets from applications in color quantization, data compression, and image segmentation.

Keywords

This publication has 22 references indexed in Scilit:

Learning mixtures of Gaussians
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
The analysis of a simple k -means clustering algorithm
Published by Association for Computing Machinery (ACM) ,2000
Statistical pattern recognition: a review
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000
Centroidal Voronoi Tessellations: Applications and Algorithms
SIAM Review, 1999
BIRCH: A New Data Clustering Algorithm and Its Applications
Data Mining and Knowledge Discovery, 1997
Experimental results of randomized clustering algorithm
Published by Association for Computing Machinery (ACM) ,1996
Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering
Published by Association for Computing Machinery (ACM) ,1994
Vector Quantization and Signal Compression
Published by Springer Science and Business Media LLC ,1992
Geometric clusterings
Journal of Algorithms, 1991
A spatial filtering approach to texture analysis
Pattern Recognition Letters, 1985

Cited by 3649 articles