Adaptive Fusion and Category-Level Dictionary Learning Model for Multiview Human Action Recognition

Abstract
Abstract– Human actions are often captured by multiple cameras (or sensors) to overcome the significant variations in viewpoints, background clutter, object speed and motion patterns in video surveillance, and action recognition systems often benefit from fusing multiple types of cameras (sensors). Therefore, adaptive fusion of the information from multiple domains is mandatory for multi-view human action recognition. Two widely applied fusion schemes are feature-level fusion and score-level fusion. We point out that limitations still exist and there is tremendous room for improvement, including the separate computation of feature fusion and action recognition, or the fixed weights for each action and each camera. However, previous fusion methods cannot accomplish them. In this work, inspired by nature, the above limitations are addressed for multi-view action recognition by developing a novel adaptive fusion and category-level dictionary learning model (abbreviated to AFCDL). It can jointly learn the adaptive weight for each camera and optimize the reconstruction of samples towards the action recognition task. To induce the dictionary learning and the reconstruction of query set (or test samples), the induced set for each category is built, and the corresponding induced regularization term is designed for the objective function. Extensive experiments on four public multi-view action benchmarks show that AFCDL can significantly outperforms the state-of-the-art methods with 3% to 10% improvement in recognition accuracy.
Funding Information
  • National Natural Science Foundation of China (61872270, 61572357)
  • Natural Science Foundation of Tianjin City (18JCYBJC85500)

This publication has 58 references indexed in Scilit: