Robust Audio-Visual Speech Recognition Based on Late Integration

13 June 2008

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Multimedia

Vol. 10 (5), 767-779
https://doi.org/10.1109/tmm.2008.922789

Abstract

Audio-visual speech recognition (AVSR) using acoustic and visual signals of speech has received attention because of its robustness in noisy environments. In this paper, we present a late integration scheme-based AVSR system whose robustness under various noise conditions is improved by enhancing the performance of the three parts composing the system. First, we improve the performance of the visual subsystem by using the stochastic optimization method for the hidden Markov models as the speech recognizer. Second, we propose a new method of considering dynamic characteristics of speech for improved robustness of the acoustic subsystem. Third, the acoustic and the visual subsystems are effectively integrated to produce final robust recognition results by using neural networks. We demonstrate the performance of the proposed methods via speaker-independent isolated word recognition experiments. The results show that the proposed system improves robustness over the conventional system under various noise conditions without a priori knowledge about the noise contained in the speech.

Keywords

This publication has 29 references indexed in Scilit:

Do You See What I Am Saying? Exploring Visual Enhancement of Speech Comprehension in Noisy Environments
Cerebral Cortex, 2006
Continuous audio–visual digit recognition using N-best decision fusion
Information Fusion, 2004
Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition
EURASIP Journal on Advances in Signal Processing, 2002
A review of speech-based bimodal recognition
IEEE Transactions on Multimedia, 2002
Audio-visual speech modeling for continuous speech recognition
IEEE Transactions on Multimedia, 2000
Sensor fusion potential exploitation-innovative architectures and illustrative applications
Proceedings of the IEEE, 1997
RASTA processing of speech
IEEE Transactions on Speech and Audio Processing, 1994
Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
Speech Communication, 1993
Fast simulated annealing
Physics Letters A, 1987
Speaker-independent isolated word recognition using dynamic features of speech spectrum
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1986

Cited by 37 articles