STAP: Spatial-Temporal Attention-Aware Pooling for Action Recognition
- 25 June 2014
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Circuits and Systems for Video Technology
- Vol. 25 (1), 77-86
- https://doi.org/10.1109/tcsvt.2014.2333151
Abstract
Human action recognition is valuable for numerous practical applications, e.g., gaming, video surveillance, and video search. In this paper we hypothesize that the classification of actions can be boosted by designing a smart feature pooling strategy under the prevalently used bag-of-words-based representation. Founded on automatic video saliency analysis, we propose the spatial-temporal attention-aware pooling scheme for feature pooling. First, the video saliencies are predicted using the video saliency model, and the localized spatial-temporal features are pooled at different saliency levels and video-saliency-guided channels are formed. Saliency-aware matching kernels are thus derived as the similarity measurement of these channels. Intuitively, the proposed kernels calculate the similarities of the video foreground (salient areas) or background (nonsalient areas) at different levels. Finally, the kernels are fed into popular support vector machines for action classification. Extensive experiments on three popular data sets for action classification validate the effectiveness of our proposed method, which outperforms the state-of-the-art methods, namely 95.3% on UCF Sports (better by 4.0%), 87.9% on YouTube data set (better by 2.5%), and achieves comparable results on Hollywood2 dataset.Keywords
Funding Information
- Ministry of Education - Singapore (MOE2012-TIF-2-G-016)
This publication has 37 references indexed in Scilit:
- Action Recognition and Localization by Hierarchical Space-Time SegmentsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Spatiotemporal Deformable Part Models for Action DetectionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- State-of-the-Art in Visual Attention ModelingIEEE Transactions on Pattern Analysis and Machine Intelligence, 2012
- Image Signature: Highlighting Sparse Salient RegionsIEEE Transactions on Pattern Analysis and Machine Intelligence, 2011
- Saliency estimation using a non-parametric low-level vision modelPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- Hidden Part Models for Human Action Recognition: Probabilistic versus Max MarginIeee Transactions On Pattern Analysis and Machine Intelligence, 2010
- Learning to predict where humans lookPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Frequency-tuned salient region detectionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- A Biologically Inspired System for Action RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2007
- Fixation mapsPublished by Association for Computing Machinery (ACM) ,2002