An Algorithm for Euclidean Sum of Squares Classification

Abstract
An analysis of surface pollen samples to discover if they fall naturally into distinct groups of similar samples is an example of a classification problem. In Euclidean classification, a set of n objects can be represented as n points in Euclidean space of p dimensions. The sum of squares criterion defines the optimal partition of the points into g disjoint groups to be the partition which minimizes the total within-group sum of squared distances about the g centroids. It is not usually feasible to examine all possible partitions of the objects into g groups. A critical review is made of algorithms which have been proposed for seeking optimal partitions. The problem is reformulated in non-linear programming terms, and a new algorithm for seeking the minimum sum of squares is described. The performance of this algorithm in analyzing the pollen data is found to compare well with the performance of three of the existing algorithms. An efficient hybrid algorithm is introduced.

This publication has 3 references indexed in Scilit: