Unsupervised and Supervised Visual Codes with Restricted Boltzmann Machines

1 January 2012

book chapter
conference paper
Published by Springer Science and Business Media LLC in Lecture Notes in Computer Science

p. 298-311
https://doi.org/10.1007/978-3-642-33715-4_22

Abstract

Recently, the coding of local features (e.g. SIFT) for image categorization tasks has been extensively studied. Incorporated within the Bag of Words (BoW) framework, these techniques optimize the projection of local features into the visual codebook, leading to state-of-the-art performances in many benchmark datasets. In this work, we propose a novel visual codebook learning approach using the restricted Boltzmann machine (RBM) as our generative model. Our contribution is three-fold. Firstly, we steer the unsupervised RBM learning using a regularization scheme, which decomposes into a combined prior for the sparsity of each feature’s representation as well as the selectivity for each codeword. The codewords are then fine-tuned to be discriminative through the supervised learning from top-down labels. Secondly, we evaluate the proposed method with the Caltech-101 and 15-Scenes datasets, either matching or outperforming state-of-the-art results. The codebooks are compact and inference is fast. Finally, we introduce an original method to visualize the codebooks and decipher what each visual codeword encodes.

Keywords

This publication has 21 references indexed in Scilit:

Ask the locals: Multi-way local pooling for image recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Learning a discriminative dictionary for sparse coding via label consistent K-SVD
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Learning mid-level features for recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
Supervised translation-invariant sparse coding
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
Efficient Highly Over-Complete Sparse Coding Using a Mixture Model
Lecture Notes in Computer Science, 2010
Semantic hashing
International Journal of Approximate Reasoning, 2009
Visual Word Ambiguity
Ieee Transactions On Pattern Analysis and Machine Intelligence, 2009
Unifying discriminative visual codebook generation with classifier training for object category recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision, 2004
Video Google: a text retrieval approach to object matching in videos
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003

Cited by 21 articles