Learning deformable action templates from cluttered videos

1 September 2009

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 1507-1514
https://doi.org/10.1109/iccv.2009.5459277

Abstract

In this paper, we present a Deformable Action Template (DAT) model that is learnable from cluttered real-world videos with weak supervisions. In our generative model, an action template is a sequence of image templates each of which consists of a set of shape and motion primitives (Gabor wavelets and optical-flow patches) at selected orientations and locations. These primitives are allowed to slightly perturb their locations and orientations to account for spatial deformations. We use a shared pursuit algorithm to automatically discover a best set of primitives and weights by maximizing the likelihood over one or more aligned training examples. Since it is extremely hard to accurately label human actions from real-world videos, we use a three-step semi-supervised learning procedure. 1) For each human action class, a template is initialized from a labeled (one bounding-box per frame) training video. 2) The template is used to detect actions from other training videos of the same class by a dynamic space-time warping algorithm, which searches a best match between the template and target video in 5D space (x, y, scale, t_template and t_target) using dynamic programming. 3) The template is updated by the shared pursuit algorithm over all aligned videos. The 2nd and 3rd steps iterate several times to arrive at an optimal action template. We tested our algorithm on a cluttered action dataset (the CMU dataset) and achieved favorable performance than. Our classification performance on the KTH dataset is also comparable to state-of-the-arts.

Keywords

VIDEOS

This publication has 14 references indexed in Scilit:

Action snippets: How many frames does human action recognition require?
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
International Journal of Computer Vision, 2008
Actions as Space-Time Shapes
Ieee Transactions On Pattern Analysis and Machine Intelligence, 2007
Space-Time Behavior-Based Correlation-OR-How to Tell If Two Underlying Motion Fields Are Similar Without Computing Them?
Ieee Transactions On Pattern Analysis and Machine Intelligence, 2007
Event Detection in Crowded Videos
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
A Biologically Inspired System for Action Recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Behavior Recognition via Sparse Spatio-Temporal Features
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
Pictorial Structures for Object Recognition
International Journal of Computer Vision, 2005
Recognizing human actions: a local SVM approach
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Recognizing action at a distance
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003

Cited by 41 articles