Learning the Compositional Nature of Visual Object Categories for Recognition
- 23 January 2009
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in Ieee Transactions On Pattern Analysis and Machine Intelligence
- Vol. 32 (3), 501-516
- https://doi.org/10.1109/tpami.2009.22
Abstract
Real-world scene understanding requires recognizing object categories in novel visual scenes. This paper describes a composition system that automatically learns structured, hierarchical object representations in an unsupervised manner without requiring manual segmentation or manual object localization. A central concept for learning object models in the challenging, general case of unconstrained scenes, large intraclass variations, large numbers of categories, and lacking supervision information is to exploit the compositional nature of our (visual) world. The compositional nature of visual objects significantly limits their representation complexity and renders learning of structured object models statistically and computationally tractable. We propose a robust descriptor for local image parts and show how characteristic compositions of parts can be learned that are based on an unspecific part vocabulary shared between all categories. Moreover, a Bayesian network is presented that comprises all the compositional constituents together with scene context and object shape. Object recognition is then formulated as a statistical inference problem in this probabilistic model.Keywords
This publication has 38 references indexed in Scilit:
- Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categoriesComputer Vision and Image Understanding, 2007
- Pictorial Structures for Object RecognitionInternational Journal of Computer Vision, 2005
- Distinctive Image Features from Scale-Invariant KeypointsInternational Journal of Computer Vision, 2004
- Scale & Affine Invariant Interest Point DetectorsInternational Journal of Computer Vision, 2004
- Learning to detect objects in images via a sparse, part-based representationIeee Transactions On Pattern Analysis and Machine Intelligence, 2004
- 10.1162/jmlr.2003.3.4-5.993Applied Physics Letters, 2000
- A Computational Model for Visual SelectionNeural Computation, 1999
- Histogram clustering for unsupervised segmentation and image retrievalPattern Recognition Letters, 1999
- Distortion invariant object recognition in the dynamic link architectureIEEE Transactions on Computers, 1993
- Some informational aspects of visual perception.Psychological Review, 1954