Isotropic PCA and Affine-Invariant Clustering

Abstract
We present an extension of principal component analysis (PCA) and a new algorithm for clustering points in \Rn based on it. The key property of the algorithm is that it is affine-invariant. When the input is a sample from a mixture of two arbitrary Gaussians, the algorithm correctly classifies the sample assuming only that the two components are separable by a hyperplane, i.e., there exists a halfspace that contains most of one Gaussian and almost none of the other in probability mass. This is nearly the best possible, improving known results substantially. For k>2 components, the algorithm requires only that there be some (k-1)-dimensional subspace in which the ``overlap'' in every direction is small. Our main tools are isotropic transformation, spectral projection and a simple reweighting technique. We call this combination isotropic PCA.

This publication has 4 references indexed in Scilit: