Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization
Top Cited Papers
- 24 July 2006
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Pattern Analysis and Machine Intelligence
- Vol. 28 (9), 1393-1403
- https://doi.org/10.1109/tpami.2006.184
Abstract
We provide evidence that nonlinear dimensionality reduction, clustering, and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to noise. Our construction, which is based on a Markov random walk on the data, offers a general scheme of simultaneously reorganizing and subsampling graphs and arbitrarily shaped data sets in high dimensions using intrinsic geometry. We show that clustering in embedding spaces is equivalent to compressing operators. The objective of data partitioning and clustering is to coarse-grain the random walk on the data while at the same time preserving a diffusion operator for the intrinsic geometry or connectivity of the data set up to some accuracy. We show that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for k-means clustering in diffusion space and a precise measure of the performance of general clustering algorithmsKeywords
This publication has 17 references indexed in Scilit:
- Diffusion waveletsApplied and Computational Harmonic Analysis, 2006
- Diffusion maps, spectral clustering and reaction coordinates of dynamical systemsApplied and Computational Harmonic Analysis, 2006
- A Novel Way of Computing Similarities between Nodes of a Graph, with Application to Collaborative RecommendationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion mapsProceedings of the National Academy of Sciences of the United States of America, 2005
- Iterative Denoising for Cross-Corpus DiscoveryPublished by Springer Science and Business Media LLC ,2004
- A Global Geometric Framework for Nonlinear Dimensionality ReductionScience, 2000
- Normalized cuts and image segmentationIeee Transactions On Pattern Analysis and Machine Intelligence, 2000
- Segmentation using eigenvectors: a unifying viewPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1999
- Nonlinear Component Analysis as a Kernel Eigenvalue ProblemNeural Computation, 1998
- The Nature of Statistical Learning TheoryPublished by Springer Science and Business Media LLC ,1995