Recognition using visual phrases
- 1 June 2011
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 1745-1752
- https://doi.org/10.1109/cvpr.2011.5995711
Abstract
In this paper we introduce visual phrases, complex visual composites like “a person riding a horse”. Visual phrases often display significantly reduced visual complexity compared to their component objects, because the appearance of those objects can change profoundly when they participate in relations. We introduce a dataset suitable for phrasal recognition that uses familiar PASCAL object categories, and demonstrate significant experimental gains resulting from exploiting visual phrases. We show that a visual phrase detector significantly outperforms a baseline which detects component objects and reasons about relations, even though visual phrase training sets tend to be smaller than those for objects. We argue that any multi-class detection system must decode detector outputs to produce final results; this is usually done with non-maximum suppression. We describe a novel decoding procedure that can account accurately for local context without solving difficult inference problems. We show this decoding procedure outperforms the state of the art. Finally, we show that decoding a combination of phrasal and object detectors produces real improvements in detector results.Keywords
This publication has 11 references indexed in Scilit:
- Modeling mutual context of object and human pose in human-object interaction activitiesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- Object Detection with Discriminatively Trained Part-Based ModelsIeee Transactions On Pattern Analysis and Machine Intelligence, 2009
- The Pascal Visual Object Classes (VOC) ChallengeInternational Journal of Computer Vision, 2009
- POP: Patchwork of Parts Models for Object RecognitionInternational Journal of Computer Vision, 2007
- Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene CategoriesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Spatial Priors for Part-Based Recognition Using Statistical ModelsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Object class recognition by unsupervised scale-invariant learningPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Efficient optimization of a deformable template using dynamic programmingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Active appearance modelsIeee Transactions On Pattern Analysis and Machine Intelligence, 2001
- Modeling the Shape of the Scene: A Holistic Representation of the Spatial EnvelopeInternational Journal of Computer Vision, 2001