Finger Gesture Spotting from Long Sequences Based on Multi-Stream Recurrent Neural Networks
Open Access
- 18 January 2020
- Vol. 20 (2), 528
- https://doi.org/10.3390/s20020528
Abstract
Gesture spotting is an essential task for recognizing finger gestures used to control in-car touchless interfaces. Automated methods to achieve this task require to detect video segments where gestures are observed, to discard natural behaviors of users’ hands that may look as target gestures, and be able to work online. In this paper, we address these challenges with a recurrent neural architecture for online finger gesture spotting. We propose a multi-stream network merging hand and hand-location features, which help to discriminate target gestures from natural movements of the hand, since these may not happen in the same 3D spatial location. Our multi-stream recurrent neural network (RNN) recurrently learns semantic information, allowing to spot gestures online in long untrimmed video sequences. In order to validate our method, we collect a finger gesture dataset in an in-vehicle scenario of an autonomous car. 226 videos with more than 2100 continuous instances were captured with a depth sensor. On this dataset, our gesture spotting approach outperforms state-of-the-art methods with an improvement of about 10% and 15% of recall and precision, respectively. Furthermore, we demonstrated that by combining with an existing gesture classifier (a 3D Convolutional Neural Network), our proposal achieves better performance than previous hand gesture recognition methods.Keywords
This publication has 25 references indexed in Scilit:
- Deep Learning for Action and Gesture Recognition in Image Sequences: A SurveyThe Springer Series on Challenges in Machine Learning, 2017
- Augmented Reality Prototype HUD for Passenger Infotainment in a Vehicular EnvironmentAdvances in Science, Technology and Engineering Systems Journal, 2017
- Multimodal Gesture Recognition Using 3-D Convolution and Convolutional LSTMIEEE Access, 2017
- DAPs: Deep Action Proposals for Action UnderstandingPublished by Springer Science and Business Media LLC ,2016
- Industry use of virtual reality in product design and manufacturing: a surveyVirtual Reality, 2016
- Computer vision for assistive technologiesComputer Vision and Image Understanding, 2016
- Large-Scale Video Classification with Convolutional Neural NetworksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Vision based hand gesture recognition for human computer interaction: a surveyArtificial Intelligence Review, 2012
- The Pascal Visual Object Classes (VOC) ChallengeInternational Journal of Computer Vision, 2009
- Skill Acquisition While Operating In-Vehicle Information Systems: Interface Design Determines the Level of Safety-Relevant DistractionsHuman Factors: The Journal of the Human Factors and Ergonomics Society, 2009