Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition
Open Access
- 2 March 2016
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Pattern Analysis and Machine Intelligence
- Vol. 38 (8), 1583-1597
- https://doi.org/10.1109/tpami.2016.2537340
Abstract
This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatiotemporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network (DBN) to handle skeletal dynamics, and a 3D Convolutional Neural Network (3DCNN) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data.Keywords
Funding Information
- European Unions Horizon 2020 Research and Innovation Programme
- Marie Sklodowska-Curie (657679)
- BMBF (01GQ1115)
- National Natural Science Foundation of China (61528106)
This publication has 42 references indexed in Scilit:
- ModDrop: Adaptive Multi-Modal Gesture RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence, 2015
- Multi-modal Gesture Recognition Using Skeletal Joints and Motion Trail ModelPublished by Springer Science and Business Media LLC ,2015
- Multi-modality Gesture Detection and Recognition with Un-supervision, Randomization and DiscriminationLecture Notes in Computer Science, 2015
- Sign Language Recognition Using Convolutional Neural NetworksPublished by Springer Science and Business Media LLC ,2015
- Deep Dynamic Neural Networks for Gesture Segmentation and RecognitionPublished by Springer Science and Business Media LLC ,2015
- ChaLearn gesture challenge: Design and first resultsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- Instructing people for training gestural interactive systemsPublished by Association for Computing Machinery (ACM) ,2012
- HMDB: A large video database for human motion recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- Connectionist probability estimators in HMM speech recognitionIEEE Transactions on Speech and Audio Processing, 1994
- Connectionist Speech RecognitionPublished by Springer Science and Business Media LLC ,1994