Learning the Compositional Nature of Visual Object Categories for Recognition

23 January 2009

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in Ieee Transactions On Pattern Analysis and Machine Intelligence

Vol. 32 (3), 501-516
https://doi.org/10.1109/tpami.2009.22

Abstract

Real-world scene understanding requires recognizing object categories in novel visual scenes. This paper describes a composition system that automatically learns structured, hierarchical object representations in an unsupervised manner without requiring manual segmentation or manual object localization. A central concept for learning object models in the challenging, general case of unconstrained scenes, large intraclass variations, large numbers of categories, and lacking supervision information is to exploit the compositional nature of our (visual) world. The compositional nature of visual objects significantly limits their representation complexity and renders learning of structured object models statistically and computationally tractable. We propose a robust descriptor for local image parts and show how characteristic compositions of parts can be learned that are based on an unspecific part vocabulary shared between all categories. Moreover, a Bayesian network is presented that comprises all the compositional constituents together with scene context and object shape. Object recognition is then formulated as a statistical inference problem in this probabilistic model.

Keywords

This publication has 38 references indexed in Scilit:

Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories
Computer Vision and Image Understanding, 2007
Pictorial Structures for Object Recognition
International Journal of Computer Vision, 2005
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision, 2004
Scale & Affine Invariant Interest Point Detectors
International Journal of Computer Vision, 2004
Learning to detect objects in images via a sparse, part-based representation
Ieee Transactions On Pattern Analysis and Machine Intelligence, 2004
10.1162/jmlr.2003.3.4-5.993
Applied Physics Letters, 2000
A Computational Model for Visual Selection
Neural Computation, 1999
Histogram clustering for unsupervised segmentation and image retrieval
Pattern Recognition Letters, 1999
Distortion invariant object recognition in the dynamic link architecture
IEEE Transactions on Computers, 1993
Some informational aspects of visual perception.
Psychological Review, 1954

Cited by 44 articles