K‐means clustering: A half‐century synthesis

Top Cited Papers

1 May 2006

journal article
review article
Published by Wiley in British Journal of Mathematical and Statistical Psychology

Vol. 59 (1), 1-34
https://doi.org/10.1348/000711005x48266

Abstract

This paper synthesizes the results, methodology, and research conducted concerning the K-means clustering method over the last fifty years. The K-means method is first introduced, various formulations of the minimum variance loss function and alternative loss functions within the same class are outlined, and different methods of choosing the number of clusters and initialization, variable preprocessing, and data reduction schemes are discussed. Theoretic statistical results are provided and various extensions of K-means using different metrics or modifications of the original algorithm are given, leading to a unifying treatment of K-means and some of its extensions. Finally, several future studies are outlined that could enhance the understanding of numerous subtleties affecting the performance of the K-means method.

Keywords

This publication has 104 references indexed in Scilit:

Model-Based Clustering, Discriminant Analysis, and Density Estimation
Journal of the American Statistical Association, 2002
Measuring the influence of individual data points in a cluster analysis
Journal of Classification, 1996
An entropy criterion for assessing the number of clusters in a mixture model
Journal of Classification, 1996
A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion
Journal of the American Statistical Association, 1995
Comparing partitions
Journal of Classification, 1985
A new look at the statistical model identification
IEEE Transactions on Automatic Control, 1974
Percentage Points of a Test for Clusters
Journal of the American Statistical Association, 1969
Integer Programming and the Theory of Grouping
Journal of the American Statistical Association, 1969
HIERARCHICAL GROUPING TO OPTIMIZE AN OBJECTIVE FUNCTION
Journal of the American Statistical Association, 1962
On Grouping for Maximum Homogeneity
Journal of the American Statistical Association, 1958

Cited by 663 articles