Neural networks for statistical recognition of continuous speech

1 May 1995

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in Proceedings of the IEEE

Vol. 83 (5), 742-772
https://doi.org/10.1109/5.381844

Abstract

In recent years there has been a significant body of work, both theoretical and experimental, that has established the viability of artificial neural networks (ANN's) as a useful technology for speech recognition. It has been shown that neural networks can be used to augment speech recognizers whose underlying structure is essentially that of hidden Markov models (HMM's). In particular, we have demonstrated that fairly simple layered structures, which we lately have termed big dumb neural networks (BDNN's), can be discriminatively trained to estimate emission probabilities for an HMM. Recently simple speech recognition systems (using context-independent phone models) based on this approach have been proved on controlled tests, to be both effective in terms of accuracy (i.e., comparable or better than equivalent state-of-the-art systems) and efficient in terms of CPU and memory run-time requirements. Research is continuing on extending these results to somewhat more complex systems. In this paper, we first give a brief overview of automatic speech recognition (ASR) and statistical pattern recognition in general. We also include a very brief review of HMM's, and then describe the use of ANN's as statistical estimators. We then review the basic principles of our hybrid HMM/ANN approach and describe some experiments. We discuss some current research topics, including new theoretical developments in training ANN's to maximize the posterior probabilities of the correct models for speech utterances. We also discuss some issues of system resources required for training and recognition. Finally, we conclude with some perspectives about fundamental limitations in the current technology and some speculations about where we can go from here.<>

Keywords

This publication has 59 references indexed in Scilit:

Connectionist probability estimators in HMM speech recognition
IEEE Transactions on Speech and Audio Processing, 1994
How do humans process and recognize speech?
IEEE Transactions on Speech and Audio Processing, 1994
Global optimization of a neural network-hidden Markov model hybrid
IEEE Transactions on Neural Networks, 1992
Neural Network Classifiers Estimate Bayesian a posteriori Probabilities
Neural Computation, 1991
Links between Markov models and multilayer perceptrons
Ieee Transactions On Pattern Analysis and Machine Intelligence, 1990
A tutorial on hidden Markov models and selected applications in speech recognition
Proceedings of the IEEE, 1989
Prevention of experimental oral cancer by extracts of Spirulina‐Dunaliella algae
Nutrition and Cancer, 1988
Speaker-independent isolated word recognition using dynamic features of speech spectrum
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1986
The use of a one-stage dynamic programming algorithm for connected word recognition
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1984
Maximum likelihood estimation for multivariate observations of Markov sources
IEEE Transactions on Information Theory, 1982

Cited by 86 articles