On landmark selection and sampling in high-dimensional data analysis

Open Access

13 November 2009

journal article
research article
Published by The Royal Society in Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences

Vol. 367 (1906), 4295-4312
https://doi.org/10.1098/rsta.2009.0161

Abstract

In recent years, the spectral analysis of appropriately defined kernel matrices has emerged as a principled way to extract the low-dimensional structure often prevalent in high-dimensional data. Here, we provide an introduction to spectral methods for linear and nonlinear dimension reduction, emphasizing ways to overcome the computational limitations currently faced by practitioners with massive datasets. In particular, a data subsampling or landmark selection process is often employed to construct a kernel based on partial information, followed by an approximate spectral analysis termed the Nyström extension. We provide a quantitative framework to analyse this procedure, and use it to demonstrate algorithmic performance bounds on a range of practical approaches designed to optimize the landmark selection process. We compare the practical implications of these bounds by way of real-world examples drawn from the field of computer vision, whereby low-dimensional manifold structure is shown to emerge from high-dimensional video data streams.

This publication has 15 references indexed in Scilit:

Spectral methods in machine learning and new strategies for very large datasets
Proceedings of the National Academy of Sciences of the United States of America, 2009
Density-Weighted Nyström Method for Computing Large Kernel Eigensystems
Neural Computation, 2009
Visual tracking and recognition using probabilistic appearance manifolds
Computer Vision and Image Understanding, 2005
Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps
Proceedings of the National Academy of Sciences of the United States of America, 2005
Spectral grouping using the nystrom method
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004
Laplacian Eigenmaps for Dimensionality Reduction and Data Representation
Neural Computation, 2003
Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data
Proceedings of the National Academy of Sciences of the United States of America, 2003
The maximal-volume concept in approximation by low-rank matrices
Contemporary Mathematics, 2001
A Global Geometric Framework for Nonlinear Dimensionality Reduction
Science, 2000
Normalized cuts and image segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000

Cited by 18 articles