On landmark selection and sampling in high-dimensional data analysis
Open Access
- 13 November 2009
- journal article
- research article
- Published by The Royal Society in Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
- Vol. 367 (1906), 4295-4312
- https://doi.org/10.1098/rsta.2009.0161
Abstract
In recent years, the spectral analysis of appropriately defined kernel matrices has emerged as a principled way to extract the low-dimensional structure often prevalent in high-dimensional data. Here, we provide an introduction to spectral methods for linear and nonlinear dimension reduction, emphasizing ways to overcome the computational limitations currently faced by practitioners with massive datasets. In particular, a data subsampling or landmark selection process is often employed to construct a kernel based on partial information, followed by an approximate spectral analysis termed the Nyström extension. We provide a quantitative framework to analyse this procedure, and use it to demonstrate algorithmic performance bounds on a range of practical approaches designed to optimize the landmark selection process. We compare the practical implications of these bounds by way of real-world examples drawn from the field of computer vision, whereby low-dimensional manifold structure is shown to emerge from high-dimensional video data streams.This publication has 15 references indexed in Scilit:
- Spectral methods in machine learning and new strategies for very large datasetsProceedings of the National Academy of Sciences of the United States of America, 2009
- Density-Weighted Nyström Method for Computing Large Kernel EigensystemsNeural Computation, 2009
- Visual tracking and recognition using probabilistic appearance manifoldsComputer Vision and Image Understanding, 2005
- Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion mapsProceedings of the National Academy of Sciences of the United States of America, 2005
- Spectral grouping using the nystrom methodIEEE Transactions on Pattern Analysis and Machine Intelligence, 2004
- Laplacian Eigenmaps for Dimensionality Reduction and Data RepresentationNeural Computation, 2003
- Hessian eigenmaps: Locally linear embedding techniques for high-dimensional dataProceedings of the National Academy of Sciences of the United States of America, 2003
- The maximal-volume concept in approximation by low-rank matricesContemporary Mathematics, 2001
- A Global Geometric Framework for Nonlinear Dimensionality ReductionScience, 2000
- Normalized cuts and image segmentationIEEE Transactions on Pattern Analysis and Machine Intelligence, 2000