Improving the Recognition Performance of Lip Reading Using the Concatenated Three Sequence Keyframe Image Technique

Open Access

11 April 2021

journal article
research article
Published by Engineering, Technology & Applied Science Research in Engineering, Technology & Applied Science Research

Vol. 11 (2), 6986-6992
https://doi.org/10.48084/etasr.4102

Abstract

This paper proposes a lip reading method based on convolutional neural networks applied to Concatenated Three Sequence Keyframe Image (C3-SKI), consisting of (a) the Start-Lip Image (SLI), (b) the Middle-Lip Image (MLI), and (c) the End-Lip Image (ELI) which is the end of the pronunciation of that syllable. The lip area’s image dimensions were reduced to 32×32 pixels per image frame and three keyframes concatenate together were used to represent one syllable with a dimension of 96×32 pixels for visual speech recognition. Every three concatenated keyframes representing any syllable are selected based on the relative maximum and relative minimum related to the open lip’s width and height. The evaluation results of the model’s effectiveness, showed accuracy, validation accuracy, loss, and validation loss values at 95.06%, 86.03%, 4.61%, and 9.04% respectively, for the THDigits dataset. The C3-SKI technique was also applied to the AVDigits dataset, showing 85.62% accuracy. In conclusion, the C3-SKI technique could be applied to perform lip reading recognition.

Keywords

This publication has 35 references indexed in Scilit:

A perceptual study of how rapidly and accurately audiovisual cues to utterance-final boundaries can be interpreted in Chinese and English
Speech Communication, 2017
End-to-end visual speech recognition with LSTMS
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2017
Automatic Viseme Vocabulary Construction to Enhance Continuous Lip-reading
Published by INSTICC ,2017
Deep Residual Learning for Image Recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Temporal Multimodal Learning in Audiovisual Speech Recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Lipreading with long short-term memory
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
An Analysis of the Viola-Jones Face Detection Algorithm
Image Processing On Line, 2014
Deformable Model Fitting by Regularized Landmark Mean-Shift
International Journal of Computer Vision, 2010
Lip-Reading Aids Word Recognition Most in Moderate Noise: A Bayesian Explanation Using High-Dimensional Feature Space
PLOS ONE, 2009
Robust Real-Time Face Detection
International Journal of Computer Vision, 2004

Cited by 6 articles