Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTM
Top Cited Papers
Open Access
- 17 March 2017
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Access
- Vol. 5, 4517-4524
- https://doi.org/10.1109/access.2017.2684186
Abstract
Gesture recognition aims to recognize meaningful movements of human bodies, and is of utmost importance in intelligent human-computer/robot interactions. In this paper, we present a multimodal gesture recognition method based on 3-D convolution and convolutional long-short-term-memory (LSTM) networks. The proposed method first learns short-term spatiotemporal features of gestures through the 3-D convolutional neural network, and then learns long-term spatiotemporal features by convolutional LSTM networks based on the extracted short-term spatiotemporal features. In addition, fine-tuning among multimodal data is evaluated, and we find that it can be considered as an optional skill to prevent overfitting when no pre-trained models exist. The proposed method is verified on the ChaLearn LAP large-scale isolated gesture data set (IsoGD) and the Sheffield Kinect gesture (SKIG) data set. The results show that our proposed method can obtain the state-of-the-art recognition accuracy (51.02% on the validation set of IsoGD and 98.89% on SKIG).Keywords
Funding Information
- China Postdoctoral Science Foundation (2016M592763)
- Fundamental Research Funds for the Central Universities (JB161006, JB161001)
- National Natural Science Foundation of China (61401324, 61305109)
This publication has 27 references indexed in Scilit:
- Deep learningNature, 2015
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence, 2015
- Inertial Gesture Recognition with BLSTM-RNNPublished by Springer Science and Business Media LLC ,2015
- A hierarchical structure for gesture recognition using RGB-D sensorPublished by Association for Computing Machinery (ACM) ,2014
- 3D SMoSIFT: three-dimensional sparse motion scale invariant feature transform for activity recognition from RGB-D videosJournal of Electronic Imaging, 2014
- The ChaLearn gesture dataset (CGD 2011)Machine Vision and Applications, 2014
- A robust static hand gesture recognition system using geometry based normalizations and Krawtchouk momentsPattern Recognition, 2013
- Learning human activities and object affordances from RGB-D videosThe International Journal of Robotics Research, 2013
- Vision based hand gesture recognition for human computer interaction: a surveyArtificial Intelligence Review, 2012
- The Visual Analysis of Human Movement: A SurveyComputer Vision and Image Understanding, 1999