Neural networks for statistical recognition of continuous speech
- 1 May 1995
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in Proceedings of the IEEE
- Vol. 83 (5), 742-772
- https://doi.org/10.1109/5.381844
Abstract
In recent years there has been a significant body of work, both theoretical and experimental, that has established the viability of artificial neural networks (ANN's) as a useful technology for speech recognition. It has been shown that neural networks can be used to augment speech recognizers whose underlying structure is essentially that of hidden Markov models (HMM's). In particular, we have demonstrated that fairly simple layered structures, which we lately have termed big dumb neural networks (BDNN's), can be discriminatively trained to estimate emission probabilities for an HMM. Recently simple speech recognition systems (using context-independent phone models) based on this approach have been proved on controlled tests, to be both effective in terms of accuracy (i.e., comparable or better than equivalent state-of-the-art systems) and efficient in terms of CPU and memory run-time requirements. Research is continuing on extending these results to somewhat more complex systems. In this paper, we first give a brief overview of automatic speech recognition (ASR) and statistical pattern recognition in general. We also include a very brief review of HMM's, and then describe the use of ANN's as statistical estimators. We then review the basic principles of our hybrid HMM/ANN approach and describe some experiments. We discuss some current research topics, including new theoretical developments in training ANN's to maximize the posterior probabilities of the correct models for speech utterances. We also discuss some issues of system resources required for training and recognition. Finally, we conclude with some perspectives about fundamental limitations in the current technology and some speculations about where we can go from here.<>Keywords
This publication has 59 references indexed in Scilit:
- Connectionist probability estimators in HMM speech recognitionIEEE Transactions on Speech and Audio Processing, 1994
- How do humans process and recognize speech?IEEE Transactions on Speech and Audio Processing, 1994
- Global optimization of a neural network-hidden Markov model hybridIEEE Transactions on Neural Networks, 1992
- Neural Network Classifiers Estimate Bayesian a posteriori ProbabilitiesNeural Computation, 1991
- Links between Markov models and multilayer perceptronsIeee Transactions On Pattern Analysis and Machine Intelligence, 1990
- A tutorial on hidden Markov models and selected applications in speech recognitionProceedings of the IEEE, 1989
- Prevention of experimental oral cancer by extracts of Spirulina‐Dunaliella algaeNutrition and Cancer, 1988
- Speaker-independent isolated word recognition using dynamic features of speech spectrumIEEE Transactions on Acoustics, Speech, and Signal Processing, 1986
- The use of a one-stage dynamic programming algorithm for connected word recognitionIEEE Transactions on Acoustics, Speech, and Signal Processing, 1984
- Maximum likelihood estimation for multivariate observations of Markov sourcesIEEE Transactions on Information Theory, 1982