Learning realistic human actions from movies

Top Cited Papers

1 June 2008

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 10636919,p. 1-8
https://doi.org/10.1109/cvpr.2008.4587756

Abstract

The aim of this paper is to address recognition of natural human actions in diverse and realistic video settings. This challenging but important subject has mostly been ignored in the past due to several problems one of which is the lack of realistic and annotated video datasets. Our first contribution is to address this limitation and to investigate the use of movie scripts for automatic annotation of human actions in videos. We evaluate alternative methods for action retrieval from scripts and show benefits of a text-based classifier. Using the retrieved action samples for visual learning, we next turn to the problem of action classification in video. We present a new method for video classification that builds upon and extends several recent ideas including local space-time features, space-time pyramids and multi-channel non-linear SVMs. The method is shown to improve state-of-the-art results on the standard KTH action dataset by achieving 91.8% accuracy. Given the inherent problem of noisy labels in automatic annotation, we particularly investigate and show high tolerance of our method to annotation errors in the training set. We finally apply the method to learning and classifying challenging action classes in movies and show promising results.

This publication has 15 references indexed in Scilit:

Representing shape with a spatial pyramid kernel
Published by Association for Computing Machinery (ACM) ,2007
Learning Motion Categories using both Semantic and Structural Information
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Extracting Spatiotemporal Interest Points using Global Information
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Harvesting Image Databases from the Web
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2007
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study
International Journal of Computer Vision, 2006
Behavior Recognition via Sparse Spatio-Temporal Features
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2006
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
Published by British Machine Vision Association and Society for Pattern Recognition ,2006
Recognizing human actions: a local SVM approach
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Machine learning in automated text categorization
ACM Computing Surveys, 2002
RELIABLE TRANSITION DETECTION IN VIDEOS: A SURVEY AND PRACTITIONER'S GUIDE
International Journal of Image and Graphics, 2001

Cited by 2257 articles