NetVLAD: CNN Architecture for Weakly Supervised Place Recognition

Top Cited Papers

1 June 2016

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 5297-5307
https://doi.org/10.1109/cvpr.2016.572

Abstract

We tackle the problem of large scale visual place recognition, where the task is to quickly and accurately recognize the location of a given query photograph. We present the following three principal contributions. First, we develop a convolutional neural network (CNN) architecture that is trainable in an end-to-end manner directly for the place recognition task. The main component of this architecture, NetVLAD, is a new generalized VLAD layer, inspired by the "Vector of Locally Aggregated Descriptors" image representation commonly used in image retrieval. The layer is readily pluggable into any CNN architecture and amenable to training via backpropagation. Second, we develop a training procedure, based on a new weakly supervised ranking loss, to learn parameters of the architecture in an end-to-end manner from images depicting the same places over time downloaded from Google Street View Time Machine. Finally, we show that the proposed architecture significantly outperforms non-learnt image representations and off-the-shelf CNN descriptors on two challenging place recognition benchmarks, and improves over current state of-the-art compact image representations on standard image retrieval benchmarks.

Keywords

This publication has 54 references indexed in Scilit:

Visual query expansion with or without geometry: Refining local descriptors by feature aggregation
Pattern Recognition, 2014
Painting-to-3D model alignment via discriminative visual elements
ACM Transactions on Graphics, 2014
Descriptor Learning Using Convex Optimisation
Lecture Notes in Computer Science, 2012
A review of multi-instance learning assumptions
The Knowledge Engineering Review, 2010
Feature Tracking for Wide-Baseline Image Retrieval
Lecture Notes in Computer Science, 2010
FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance
The International Journal of Robotics Research, 2008
Local Invariant Feature Detectors: A Survey
Foundations and Trends® in Computer Graphics and Vision, 2007
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision, 2004
Gradient-based learning applied to document recognition
Proceedings of the IEEE, 1998
Backpropagation Applied to Handwritten Zip Code Recognition
Neural Computation, 1989

Cited by 1196 articles