A Clustering Scheme for Multispectral Images

Abstract
A clustering scheme using a multidimensional histogram stored in a table is described and tested on four-dimensional data derived from LANDSAT imagery. By doing all clustering operations on the histogram rather than on the original measurement vectors, it is possible to reduce the computations by a large factor and handle the large sample sizes that are typically encountered in image processing. The clustering algorithm first isolates and delineates the peaks in the multidimensional histogram. These peaks are then used as cluster centers, and all the other measurement vectors in the histogram are assigned to the nearest center. The scheme initially identifies the most separable clusters in the data. It then runs on an interactive basis allowing the user to split specific clusters into subclusters at the expense of less separability. The histogram approach lends itself to statistical analysis using parametric models and the likelihood ratio test. As a starting point, it is assumed that the observed distribution is a mixture of several multivariate Gaussian distributions with unknown mean vectors, covariance matrices, and a priori probabilities. Estimates of the Gaussian parameters are determined ignoring the overlap of the neighboring distributions. The theoretical histogram is then calculated by integrating numerically the probability density function in each of the cells of the histogram, and the likelihood ratio test is applied to measure the departure of the model from the observed data. A statistical measure taking into account the number of degrees of freedom is defined and used to choose between alternative models.