Learning a hierarchy of discriminative space-time neighborhood features for human action recognition
Top Cited Papers
- 1 June 2010
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 2046-2053
- https://doi.org/10.1109/cvpr.2010.5539881
Abstract
Recent work shows how to use local spatio-temporal features to learn models of realistic human actions from video. However, existing methods typically rely on a predefined spatial binning of the local descriptors to impose spatial information beyond a pure “bag-of-words” model, and thus may fail to capture the most informative space-time relationships. We propose to learn the shapes of space-time feature neighborhoods that are most discriminative for a given action category. Given a set of training videos, our method first extracts local motion and appearance features, quantizes them to a visual vocabulary, and then forms candidate neighborhoods consisting of the words associated with nearby points and their orientation with respect to the central interest point. Rather than dictate a particular scaling of the spatial and temporal dimensions to determine which points are near, we show how to learn the class-specific distance functions that form the most informative configurations. Descriptors for these variable-sized neighborhoods are then recursively mapped to higher-level vocabularies, producing a hierarchy of space-time configurations at successively broader scales. Our approach yields state-of-the-art performance on the UCF Sports and KTH datasets.Keywords
This publication has 25 references indexed in Scilit:
- Local Trinary Patterns for human action recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Foreground Focus: Unsupervised Learning from Partially Matching ImagesInternational Journal of Computer Vision, 2009
- Evaluation of local spatio-temporal features for action recognitionPublished by British Machine Vision Association and Society for Pattern Recognition ,2009
- Learning realistic human actions from moviesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- A Spatio-Temporal Descriptor Based on 3D-GradientsPublished by British Machine Vision Association and Society for Pattern Recognition ,2008
- Actions as Space-Time ShapesIeee Transactions On Pattern Analysis and Machine Intelligence, 2007
- Discovery of Collocation Patterns: from Visual Words to Visual PhrasesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Action Recognition in Broadcast Tennis Video Using Optical Flow and Support Vector MachineLecture Notes in Computer Science, 2006
- On Space-Time Interest PointsInternational Journal of Computer Vision, 2005
- Recognizing human actions: a local SVM approachPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004