Actions in context

Top Cited Papers

1 June 2009

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 10636919,p. 2929-2936
https://doi.org/10.1109/cvpr.2009.5206557

Abstract

This paper exploits the context of natural dynamic scenes for human action recognition in video. Human actions are frequently constrained by the purpose and the physical properties of scenes and demonstrate high correlation with particular scene classes. For example, eating often happens in a kitchen while running is more common outdoors. The contribution of this paper is three-fold: (a) we automatically discover relevant scene classes and their correlation with human actions, (b) we show how to learn selected scene classes from video without manual supervision and (c) we develop a joint framework for action and scene recognition and demonstrate improved recognition of both in natural video. We use movie scripts as a means of automatic supervision for training. For selected action classes we identify correlated scene classes in text and then retrieve video samples of actions and scenes for training using script-to-video alignment. Our visual models for scenes and actions are formulated within the bag-of-features framework and are combined in a joint scene-action SVM-based classifier. We report experimental results and validate the method on a new large dataset with twelve action classes and ten scene classes acquired from 69 movies.

Keywords

This publication has 19 references indexed in Scilit:

Learning realistic human actions from movies
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
Hierarchical Recognition of Human Activities Interacting with Objects
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
What, where and who? Classifying events by scene and object recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Objects in Context
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
Published by British Machine Vision Association and Society for Pattern Recognition ,2006
On Space-Time Interest Points
International Journal of Computer Vision, 2005
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision, 2004
Scale & Affine Invariant Interest Point Detectors
International Journal of Computer Vision, 2004
Exploiting human actions and object context for recognition tasks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1999

Cited by 679 articles