Learning mid-level features for recognition

Top Cited Papers

1 June 2010

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 2559-2566
https://doi.org/10.1109/cvpr.2010.5539963

Abstract

Many successful models for scene or object recognition transform low-level descriptors (such as Gabor filter responses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be broken down into two steps: (1) a coding step, which performs a pointwise transformation of the descriptors into a representation better adapted to the task, and (2) a pooling step, which summarizes the coded features over larger neighborhoods. Several combinations of coding and pooling schemes have been proposed in the literature. The goal of this paper is threefold. We seek to establish the relative importance of each step of mid-level feature extraction through a comprehensive cross evaluation of several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the average, or the maximum), which obtains state-of-the-art performance or better on several recognition benchmarks. We show how to improve the best performing coding scheme by learning a supervised discriminative dictionary for sparse coding. We provide theoretical and empirical insight into the remarkable performance of max pooling. By teasing apart components shared by modern mid-level feature extractors, our approach aims to facilitate the design of better recognition architectures.

Keywords

This publication has 21 references indexed in Scilit:

Learning Local Image Descriptors
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Reducing the Dimensionality of Data with Neural Networks
Science, 2006
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
Object Recognition with Features Inspired by Visual Cortex
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Histograms of Oriented Gradients for Human Detection
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision, 2004
Video Google: a text retrieval approach to object matching in videos
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Efficient BackProp
Published by Springer Science and Business Media LLC ,1998
Gradient-based learning applied to document recognition
Proceedings of the IEEE, 1998
Sparse coding with an overcomplete basis set: A strategy employed by V1?
Vision Research, 1997

Cited by 650 articles