Isotropic PCA and Affine-Invariant Clustering

Abstract

We present an extension of principal component analysis (PCA) and a new algorithm for clustering points in \Rⁿ based on it. The key property of the algorithm is that it is affine-invariant. When the input is a sample from a mixture of two arbitrary Gaussians, the algorithm correctly classifies the sample assuming only that the two components are separable by a hyperplane, i.e., there exists a halfspace that contains most of one Gaussian and almost none of the other in probability mass. This is nearly the best possible, improving known results substantially. For k>2 components, the algorithm requires only that there be some (k-1)-dimensional subspace in which the ``overlap'' in every direction is small. Our main tools are isotropic transformation, spectral projection and a simple reweighting technique. We call this combination isotropic PCA.

Keywords

This publication has 4 references indexed in Scilit:

On Learning Mixtures of Heavy-Tailed Distributions
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Learning mixtures of separated nonspherical Gaussians
The Annals of Applied Probability, 2005
The Spectral Method for General Mixture Models
Lecture Notes in Computer Science, 2005
Learning mixtures of Gaussians
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003

Cited by 36 articles