Recognition using visual phrases

1 June 2011

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 1745-1752
https://doi.org/10.1109/cvpr.2011.5995711

Abstract

In this paper we introduce visual phrases, complex visual composites like “a person riding a horse”. Visual phrases often display significantly reduced visual complexity compared to their component objects, because the appearance of those objects can change profoundly when they participate in relations. We introduce a dataset suitable for phrasal recognition that uses familiar PASCAL object categories, and demonstrate significant experimental gains resulting from exploiting visual phrases. We show that a visual phrase detector significantly outperforms a baseline which detects component objects and reasons about relations, even though visual phrase training sets tend to be smaller than those for objects. We argue that any multi-class detection system must decode detector outputs to produce final results; this is usually done with non-maximum suppression. We describe a novel decoding procedure that can account accurately for local context without solving difficult inference problems. We show this decoding procedure outperforms the state of the art. Finally, we show that decoding a combination of phrasal and object detectors produces real improvements in detector results.

Keywords

This publication has 11 references indexed in Scilit:

Modeling mutual context of object and human pose in human-object interaction activities
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
Object Detection with Discriminatively Trained Part-Based Models
Ieee Transactions On Pattern Analysis and Machine Intelligence, 2009
The Pascal Visual Object Classes (VOC) Challenge
International Journal of Computer Vision, 2009
POP: Patchwork of Parts Models for Object Recognition
International Journal of Computer Vision, 2007
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
Spatial Priors for Part-Based Recognition Using Statistical Models
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Object class recognition by unsupervised scale-invariant learning
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Efficient optimization of a deformable template using dynamic programming
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Active appearance models
Ieee Transactions On Pattern Analysis and Machine Intelligence, 2001
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
International Journal of Computer Vision, 2001

Cited by 247 articles