A randomized algorithm for principal component analysis

Preprint

12 September 2008

preprint

Published in ArXiv

http://arxiv.org/abs/0809.2274v2

Abstract

Principal component analysis (PCA) requires the computation of a low-rank approximation to a matrix containing the data being analyzed. In many applications of PCA, the best possible accuracy of any rank-deficient approximation is at most a few digits (measured in the spectral norm, relative to the spectral norm of the matrix being approximated). In such circumstances, existing efficient algorithms do not guarantee good accuracy for the approximations they produce, unless one or both dimensions of the matrix being approximated are small. We describe an efficient algorithm for the low-rank approximation of matrices that produces accuracy very close to the best possible, for matrices of arbitrary sizes. We illustrate our theoretical results via several numerical examples.

Keywords

PRINCIPAL COMPONENT ANALYSIS
RANDOMIZED ALGORITHM
EFFICIENT ALGORITHMS
RANK APPROXIMATION
SPECTRAL NORM
MATRICES
LOW RANK
PCA

Other Versions

Version 1, 2008-09-12, preprints
Version 2, 2008-09-12, preprints
Version 3, 2008-09-12, preprints
Version 4, 2008-09-12, preprints