Action recognition using context and appearance distribution features

1 June 2011

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 489-496
https://doi.org/10.1109/cvpr.2011.5995624

Abstract

We first propose a new spatio-temporal context distribution feature of interest points for human action recognition. Each action video is expressed as a set of relative XYT coordinates between pairwise interest points in a local region. We learn a global GMM (referred to as Universal Background Model, UBM) using the relative coordinate features from all the training videos, and then represent each video as the normalized parameters of a video-specific GMM adapted from the global GMM. In order to capture the spatio-temporal relationships at different levels, multiple GMMs are utilized to describe the context distributions of interest points over multi-scale local regions. To describe the appearance information of an action video, we also propose to use GMM to characterize the distribution of local appearance features from the cuboids centered around the interest points. Accordingly, an action video can be represented by two types of distribution features: 1) multiple GMM distributions of spatio-temporal context; 2) GMM distribution of local video appearance. To effectively fuse these two types of heterogeneous and complementary distribution features, we additionally propose a new learning algorithm, called Multiple Kernel Learning with Augmented Features (AFMKL), to learn an adapted classifier based on multiple kernels and the pre-learned classifiers of other action classes. Extensive experiments on KTH, multi-view IXMAS and complex UCF sports datasets demonstrate that our method generally achieves higher recognition accuracy than other state-of-the-art methods.

Keywords

This publication has 21 references indexed in Scilit:

Learning a hierarchy of discriminative space-time neighborhood features for human action recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
More generality in efficient multiple kernel learning
Published by Association for Computing Machinery (ACM) ,2009
Evaluation of local spatio-temporal features for action recognition
Published by British Machine Vision Association and Society for Pattern Recognition ,2009
Learning realistic human actions from movies
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2008
Distribution-Based Dimensionality Reduction Applied to Articulated Motion Recognition
Ieee Transactions On Pattern Analysis and Machine Intelligence, 2008
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
International Journal of Computer Vision, 2008
Recognizing human actions: a local SVM approach
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2004
Statistical motion model based on the change of feature relationships: human gait-based recognition
Ieee Transactions On Pattern Analysis and Machine Intelligence, 2003

Cited by 157 articles