Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique
Open Access
- 11 April 2021
- journal article
- research article
- Published by Engineering, Technology & Applied Science Research in Engineering, Technology & Applied Science Research
- Vol. 11 (2), 6986-6992
- https://doi.org/10.48084/etasr.4102
Abstract
This paper proposes a lip reading method based on convolutional neural networks applied to Concatenated Three Sequence Keyframe Image (C3-SKI), consisting of (a) the Start-Lip Image (SLI), (b) the Middle-Lip Image (MLI), and (c) the End-Lip Image (ELI) which is the end of the pronunciation of that syllable. The lip area’s image dimensions were reduced to 32×32 pixels per image frame and three keyframes concatenate together were used to represent one syllable with a dimension of 96×32 pixels for visual speech recognition. Every three concatenated keyframes representing any syllable are selected based on the relative maximum and relative minimum related to the open lip’s width and height. The evaluation results of the model’s effectiveness, showed accuracy, validation accuracy, loss, and validation loss values at 95.06%, 86.03%, 4.61%, and 9.04% respectively, for the THDigits dataset. The C3-SKI technique was also applied to the AVDigits dataset, showing 85.62% accuracy. In conclusion, the C3-SKI technique could be applied to perform lip reading recognition.Keywords
This publication has 35 references indexed in Scilit:
- A perceptual study of how rapidly and accurately audiovisual cues to utterance-final boundaries can be interpreted in Chinese and EnglishSpeech Communication, 2017
- End-to-end visual speech recognition with LSTMSPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2017
- Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-readingPublished by INSTICC ,2017
- Deep Residual Learning for Image RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Temporal Multimodal Learning in Audiovisual Speech RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Lipreading with long short-term memoryPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- An Analysis of the Viola-Jones Face Detection AlgorithmImage Processing On Line, 2014
- Deformable Model Fitting by Regularized Landmark Mean-ShiftInternational Journal of Computer Vision, 2010
- Lip-Reading Aids Word Recognition Most in Moderate Noise: A Bayesian Explanation Using High-Dimensional Feature SpacePLOS ONE, 2009
- Robust Real-Time Face DetectionInternational Journal of Computer Vision, 2004