NetVLAD: CNN Architecture for Weakly Supervised Place Recognition
Top Cited Papers
- 1 June 2016
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 5297-5307
- https://doi.org/10.1109/cvpr.2016.572
Abstract
We tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph. We present the following three principal contributions. First, we develop a convolutional neural network (CNN) architecture that is trainable in an end-to-end manner directly for the place recognition task. The main component of this architecture, NetVLAD, is a new generalized VLAD layer, inspired by the "Vector of Locally Aggregated Descriptors" image representation commonly used in image retrieval. The layer is readily pluggable into any CNN architecture and amenable to training via backpropagation. Second, we develop a training procedure, based on a new weakly supervised ranking loss, to learn parameters of the architecture in an end-to-end manner from images depicting the same places over time downloaded from Google Street View Time Machine. Finally, we show that the proposed architecture significantly outperforms non-learnt image representations and off-the-shelf CNN descriptors on two challenging place recognition benchmarks, and improves over current state of-the-art compact image representations on standard image retrieval benchmarks.Keywords
This publication has 54 references indexed in Scilit:
- Visual query expansion with or without geometry: Refining local descriptors by feature aggregationPattern Recognition, 2014
- Painting-to-3D model alignment via discriminative visual elementsACM Transactions on Graphics, 2014
- Descriptor Learning Using Convex OptimisationLecture Notes in Computer Science, 2012
- A review of multi-instance learning assumptionsThe Knowledge Engineering Review, 2010
- Feature Tracking for Wide-Baseline Image RetrievalLecture Notes in Computer Science, 2010
- FAB-MAP: Probabilistic Localization and Mapping in the Space of AppearanceThe International Journal of Robotics Research, 2008
- Local Invariant Feature Detectors: A SurveyFoundations and Trends® in Computer Graphics and Vision, 2007
- Distinctive Image Features from Scale-Invariant KeypointsInternational Journal of Computer Vision, 2004
- Gradient-based learning applied to document recognitionProceedings of the IEEE, 1998
- Backpropagation Applied to Handwritten Zip Code RecognitionNeural Computation, 1989