Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM

Top Cited Papers

Open Access

17 March 2017

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Access

Vol. 5, 4517-4524
https://doi.org/10.1109/access.2017.2684186

Abstract

Gesture recognition aims to recognize meaningful movements of human bodies, and is of utmost importance in intelligent human-computer/robot interactions. In this paper, we present a multimodal gesture recognition method based on 3-D convolution and convolutional long-short-term-memory (LSTM) networks. The proposed method first learns short-term spatiotemporal features of gestures through the 3-D convolutional neural network, and then learns long-term spatiotemporal features by convolutional LSTM networks based on the extracted short-term spatiotemporal features. In addition, fine-tuning among multimodal data is evaluated, and we find that it can be considered as an optional skill to prevent overfitting when no pre-trained models exist. The proposed method is verified on the ChaLearn LAP large-scale isolated gesture data set (IsoGD) and the Sheffield Kinect gesture (SKIG) data set. The results show that our proposed method can obtain the state-of-the-art recognition accuracy (51.02% on the validation set of IsoGD and 98.89% on SKIG).

Keywords

Funding Information

China Postdoctoral Science Foundation (2016M592763)
Fundamental Research Funds for the Central Universities (JB161006, JB161001)
National Natural Science Foundation of China (61401324, 61305109)

This publication has 27 references indexed in Scilit:

Deep learning
Nature, 2015
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015
Inertial Gesture Recognition with BLSTM-RNN
Published by Springer Science and Business Media LLC ,2015
A hierarchical structure for gesture recognition using RGB-D sensor
Published by Association for Computing Machinery (ACM) ,2014
3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videos
Journal of Electronic Imaging, 2014
The ChaLearn gesture dataset (CGD 2011)
Machine Vision and Applications, 2014
A robust static hand gesture recognition system using geometry based normalizations and Krawtchouk moments
Pattern Recognition, 2013
Learning human activities and object affordances from RGB-D videos
The International Journal of Robotics Research, 2013
Vision based hand gesture recognition for human computer interaction: a survey
Artificial Intelligence Review, 2012
The Visual Analysis of Human Movement: A Survey
Computer Vision and Image Understanding, 1999

Cited by 196 articles