Efficient Visual Search for Objects in Videos
- 14 March 2008
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in Proceedings of the IEEE
- Vol. 96 (4), 548-566
- https://doi.org/10.1109/jproc.2008.916343
Abstract
We describe an approach to generalize the concept of text-based search to nontextual information. In particular, we elaborate on the possibilities of retrieving objects or scenes in a movie with the ease, speed, and accuracy with which Google retrieves web pages containing particular words, by specifying the query as an image of the object or scene. In our approach, each frame of the video is represented by a set of viewpoint invariant region descriptors. These descriptors enable recognition to proceed successfully despite changes in viewpoint, illumination, and partial occlusion. Vector quantizing these region descriptors provides a visual analogy of a word, which we term a ldquovisual word.rdquo Efficient retrieval is then achieved by employing methods from statistical text retrieval, including inverted file systems, and text and document frequency weightings. The final ranking also depends on the spatial layout of the regions. Object retrieval results are reported on the full length feature films ldquoGroundhog Day,rdquo ldquoCharade,rdquo and ldquoPretty Woman,rdquo including searches from within the movie and also searches specified by external images downloaded from the Internet. We discuss three research directions for the presented video retrieval approach and review some recent work addressing them: 1) building visual vocabularies for very large-scale retrieval; 2) retrieval of 3-D objects; and 3) more thorough verification and ranking using the spatial structure of objects.Keywords
This publication has 33 references indexed in Scilit:
- Object retrieval with large vocabularies and fast spatial matchingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Scalable Recognition with a Vocabulary TreePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- A Comparison of Affine Region DetectorsInternational Journal of Computer Vision, 2005
- Modeling scenes with local descriptors and latent aspectsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Segmenting, modeling, and matching video clips containing multiple moving objectsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraintsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- A performance evaluation of local descriptorsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003
- Shape recognition with edge-based featuresPublished by British Machine Vision Association and Society for Pattern Recognition ,2003
- Automated Scene Matching in MoviesLecture Notes in Computer Science, 2002
- Object Recognition using Local Affine Frames on Distinguished RegionsPublished by British Machine Vision Association and Society for Pattern Recognition ,2002