Learning deformable action templates from cluttered videos
- 1 September 2009
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 1507-1514
- https://doi.org/10.1109/iccv.2009.5459277
Abstract
In this paper, we present a Deformable Action Template (DAT) model that is learnable from cluttered real-world videos with weak supervisions. In our generative model, an action template is a sequence of image templates each of which consists of a set of shape and motion primitives (Gabor wavelets and optical-flow patches) at selected orientations and locations. These primitives are allowed to slightly perturb their locations and orientations to account for spatial deformations. We use a shared pursuit algorithm to automatically discover a best set of primitives and weights by maximizing the likelihood over one or more aligned training examples. Since it is extremely hard to accurately label human actions from real-world videos, we use a three-step semi-supervised learning procedure. 1) For each human action class, a template is initialized from a labeled (one bounding-box per frame) training video. 2) The template is used to detect actions from other training videos of the same class by a dynamic space-time warping algorithm, which searches a best match between the template and target video in 5D space (x, y, scale, ttemplate and ttarget) using dynamic programming. 3) The template is updated by the shared pursuit algorithm over all aligned videos. The 2nd and 3rd steps iterate several times to arrive at an optimal action template. We tested our algorithm on a cluttered action dataset (the CMU dataset) and achieved favorable performance than. Our classification performance on the KTH dataset is also comparable to state-of-the-arts.Keywords
This publication has 14 references indexed in Scilit:
- Action snippets: How many frames does human action recognition require?Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
- Unsupervised Learning of Human Action Categories Using Spatial-Temporal WordsInternational Journal of Computer Vision, 2008
- Actions as Space-Time ShapesIeee Transactions On Pattern Analysis and Machine Intelligence, 2007
- Space-Time Behavior-Based Correlation-OR-How to Tell If Two Underlying Motion Fields Are Similar Without Computing Them?Ieee Transactions On Pattern Analysis and Machine Intelligence, 2007
- Event Detection in Crowded VideosPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- A Biologically Inspired System for Action RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Behavior Recognition via Sparse Spatio-Temporal FeaturesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Pictorial Structures for Object RecognitionInternational Journal of Computer Vision, 2005
- Recognizing human actions: a local SVM approachPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Recognizing action at a distancePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003