Learning a hierarchy of discriminative space-time neighborhood features for human action recognition

Top Cited Papers

1 June 2010

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 2046-2053
https://doi.org/10.1109/cvpr.2010.5539881

Abstract

Recent work shows how to use local spatio-temporal features to learn models of realistic human actions from video. However, existing methods typically rely on a predefined spatial binning of the local descriptors to impose spatial information beyond a pure “bag-of-words” model, and thus may fail to capture the most informative space-time relationships. We propose to learn the shapes of space-time feature neighborhoods that are most discriminative for a given action category. Given a set of training videos, our method first extracts local motion and appearance features, quantizes them to a visual vocabulary, and then forms candidate neighborhoods consisting of the words associated with nearby points and their orientation with respect to the central interest point. Rather than dictate a particular scaling of the spatial and temporal dimensions to determine which points are near, we show how to learn the class-specific distance functions that form the most informative configurations. Descriptors for these variable-sized neighborhoods are then recursively mapped to higher-level vocabularies, producing a hierarchy of space-time configurations at successively broader scales. Our approach yields state-of-the-art performance on the UCF Sports and KTH datasets.

Keywords

This publication has 25 references indexed in Scilit:

Local Trinary Patterns for human action recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
Foreground Focus: Unsupervised Learning from Partially Matching Images
International Journal of Computer Vision, 2009
Evaluation of local spatio-temporal features for action recognition
Published by British Machine Vision Association and Society for Pattern Recognition ,2009
Learning realistic human actions from movies
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
A Spatio-Temporal Descriptor Based on 3D-Gradients
Published by British Machine Vision Association and Society for Pattern Recognition ,2008
Actions as Space-Time Shapes
Ieee Transactions On Pattern Analysis and Machine Intelligence, 2007
Discovery of Collocation Patterns: from Visual Words to Visual Phrases
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Action Recognition in Broadcast Tennis Video Using Optical Flow and Support Vector Machine
Lecture Notes in Computer Science, 2006
On Space-Time Interest Points
International Journal of Computer Vision, 2005
Recognizing human actions: a local SVM approach
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004

Cited by 344 articles