Action bank: A high-level representation of activity in video

Top Cited Papers

Abstract

Activity recognition in video is dominated by low- and mid-level features, and while demonstrably capable, by nature, these features carry little semantic meaning. Inspired by the recent object bank approach to image representation, we present Action Bank, a new high-level representation of video. Action bank is comprised of many individual action detectors sampled broadly in semantic space as well as viewpoint space. Our representation is constructed to be semantically rich and even when paired with simple linear SVM classifiers is capable of highly discriminative performance. We have tested action bank on four major activity recognition benchmarks. In all cases, our performance is better than the state of the art, namely 98.2% on KTH (better by 3.3%), 95.0% on UCF Sports (better by 3.7%), 57.9% on UCF50 (baseline is 47.9%), and 26.9% on HMDB51 (baseline is 23.2%). Furthermore, when we analyze the classifiers, we find strong transfer of semantics from the constituent action detectors to the bank classifier.

Keywords

This publication has 31 references indexed in Scilit:

HMDB: A large video database for human motion recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Action recognition by dense trajectories
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Secrets of optical flow estimation and their principles
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
Learning a hierarchy of discriminative space-time neighborhood features for human action recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
Local Trinary Patterns for human action recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
More generality in efficient multiple kernel learning
Published by Association for Computing Machinery (ACM) ,2009
Learning realistic human actions from movies
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
A Spatio-Temporal Descriptor Based on 3D-Gradients
Published by British Machine Vision Association and Society for Pattern Recognition ,2008
On Space-Time Interest Points
International Journal of Computer Vision, 2005
Recognizing human actions: a local SVM approach
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004

Cited by 444 articles