Adaptive Fusion and Category-Level Dictionary Learning Model for Multiview Human Action Recognition

Top Cited Papers

17 April 2019

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Internet of Things Journal

Vol. 6 (6), 9280-9293
https://doi.org/10.1109/jiot.2019.2911669

Abstract

Abstract– Human actions are often captured by multiple cameras (or sensors) to overcome the significant variations in viewpoints, background clutter, object speed and motion patterns in video surveillance, and action recognition systems often benefit from fusing multiple types of cameras (sensors). Therefore, adaptive fusion of the information from multiple domains is mandatory for multi-view human action recognition. Two widely applied fusion schemes are feature-level fusion and score-level fusion. We point out that limitations still exist and there is tremendous room for improvement, including the separate computation of feature fusion and action recognition, or the fixed weights for each action and each camera. However, previous fusion methods cannot accomplish them. In this work, inspired by nature, the above limitations are addressed for multi-view action recognition by developing a novel adaptive fusion and category-level dictionary learning model (abbreviated to AFCDL). It can jointly learn the adaptive weight for each camera and optimize the reconstruction of samples towards the action recognition task. To induce the dictionary learning and the reconstruction of query set (or test samples), the induced set for each category is built, and the corresponding induced regularization term is designed for the objective function. Extensive experiments on four public multi-view action benchmarks show that AFCDL can significantly outperforms the state-of-the-art methods with 3% to 10% improvement in recognition accuracy.

Funding Information

National Natural Science Foundation of China (61872270, 61572357)
Natural Science Foundation of Tianjin City (18JCYBJC85500)

This publication has 58 references indexed in Scilit:

Computer vision for RGB-D sensors: Kinect and its applications [special issue intro.]
IEEE Transactions on Cybernetics, 2013
Spatio-Temporal Laplacian Pyramid Coding for Action Recognition
IEEE Transactions on Cybernetics, 2013
Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013
Performance evaluation of early and late fusion methods for generic semantics indexing
Pattern Analysis and Applications, 2013
Dense Trajectories and Motion Boundary Descriptors for Action Recognition
International Journal of Computer Vision, 2013
Double Fusion for Multimedia Event Detection
Lecture Notes in Computer Science, 2012
Multi-view Discriminant Analysis
Lecture Notes in Computer Science, 2012
Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Łojasiewicz Inequality
Mathematics of Operations Research, 2010
$rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation
IEEE Transactions on Signal Processing, 2006
Least angle regression
The Annals of Statistics, 2004

Cited by 169 articles