Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM
- 1 December 2012
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 131-136
- https://doi.org/10.1109/slt.2012.6424210
Abstract
Context-dependent deep neural network hidden Markov model (CD-DNN-HMM) is a recently proposed acoustic model that significantly outperformed Gaussian mixture model (GMM)-HMM systems in many large vocabulary speech recognition (LVSR) tasks. In this paper we present our strategy of using mixed-bandwidth training data to improve wideband speech recognition accuracy in the CD-DNN-HMM framework. We show that DNNs provide the flexibility of using arbitrary features. By using the Mel-scale log-filter bank features we not only achieve higher recognition accuracy than using MFCCs, but also can formulate the mixed-bandwidth training problem as a missing feature problem, in which several feature dimensions have no value when narrowband speech is presented. This treatment makes training CD-DNN-HMMs with mixed-bandwidth data an easy task since no bandwidth extension is needed. Our experiments on voice search data indicate that the proposed solution not only provides higher recognition accuracy for the wideband speech but also allows the same CD-DNN-HMM to recognize mixed-bandwidth speech. By exploiting mixed-bandwidth training data CD-DNN-HMM outperforms fMPE+BMMI trained GMM-HMM, which cannot benefit from using narrowband data, by 18.4%.Keywords
This publication has 14 references indexed in Scilit:
- Understanding how Deep Belief Networks perform acoustic modellingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- Making Deep Belief Networks effective for large vocabulary continuous speech recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- Joint encoding of the waveform and speech recognition features using a transform codecPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech RecognitionIEEE Transactions on Audio, Speech, and Language Processing, 2011
- Training Wideband Acoustic Models Using Mixed-Bandwidth Training Data for Speech RecognitionIEEE Transactions on Audio, Speech, and Language Processing, 2006
- fMPE: Discriminatively Trained Features for Speech RecognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- Sources of degradation of speech recognition in the telephone networkPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Narrowband to wideband conversion of speech using GMM based transformationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Wideband extension of telephone speech using a hidden Markov modelPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2000
- Statistical recovery of wideband speech from narrowband speechIEEE Transactions on Speech and Audio Processing, 1994