Efficient Visual Search for Objects in Videos

14 March 2008

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in Proceedings of the IEEE

Vol. 96 (4), 548-566
https://doi.org/10.1109/jproc.2008.916343

Abstract

We describe an approach to generalize the concept of text-based search to nontextual information. In particular, we elaborate on the possibilities of retrieving objects or scenes in a movie with the ease, speed, and accuracy with which Google retrieves web pages containing particular words, by specifying the query as an image of the object or scene. In our approach, each frame of the video is represented by a set of viewpoint invariant region descriptors. These descriptors enable recognition to proceed successfully despite changes in viewpoint, illumination, and partial occlusion. Vector quantizing these region descriptors provides a visual analogy of a word, which we term a ldquovisual word.rdquo Efficient retrieval is then achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. The final ranking also depends on the spatial layout of the regions. Object retrieval results are reported on the full length feature films ldquoGroundhog Day,rdquo ldquoCharade,rdquo and ldquoPretty Woman,rdquo including searches from within the movie and also searches specified by external images downloaded from the Internet. We discuss three research directions for the presented video retrieval approach and review some recent work addressing them: 1) building visual vocabularies for very large-scale retrieval; 2) retrieval of 3-D objects; and 3) more thorough verification and ranking using the spatial structure of objects.

Keywords

This publication has 33 references indexed in Scilit:

Object retrieval with large vocabularies and fast spatial matching
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Scalable Recognition with a Vocabulary Tree
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
A Comparison of Affine Region Detectors
International Journal of Computer Vision, 2005
Modeling scenes with local descriptors and latent aspects
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2005
Segmenting, modeling, and matching video clips containing multiple moving objects
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
A performance evaluation of local descriptors
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003
Shape recognition with edge-based features
Published by British Machine Vision Association and Society for Pattern Recognition ,2003
Automated Scene Matching in Movies
Lecture Notes in Computer Science, 2002
Object Recognition using Local Affine Frames on Distinguished Regions
Published by British Machine Vision Association and Society for Pattern Recognition ,2002

Cited by 52 articles