Discriminatively trained recurrent neural networks for single-channel speech separation
Top Cited Papers
- 1 December 2014
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
This paper describes an in-depth investigation of training criteria, network architectures and feature representations for regression-based single-channel speech separation with deep neural networks (DNNs). We use a generic discriminative training criterion corresponding to optimal source reconstruction from time-frequency masks, and introduce its application to speech separation in a reduced feature space (Mel domain). A comparative evaluation of time-frequency mask estimation by DNNs, recurrent DNNs and non-negative matrix factorization on the 2nd CHiME Speech Separation and Recognition Challenge shows consistent improvements by discriminative training, whereas long short-term memory recurrent DNNs obtain the overall best results. Furthermore, our results confirm the importance of fine-tuning the feature representation for DNN training.Keywords
This publication has 19 references indexed in Scilit:
- Feature enhancement by deep LSTM networks for ASR in reverberant multisource environmentsComputer Speech & Language, 2014
- Mask-based enhancement for very low quality speechPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Coupled dictionary training for exemplar-based speech enhancementPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Deep learning for monaural speech separationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Speech recognition with deep recurrent neural networksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Ideal ratio mask estimation using deep neural networks for robust speech recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselinesPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Learning to Forget: Continual Prediction with LSTMNeural Computation, 2000
- Long Short-Term MemoryNeural Computation, 1997
- Perceptual linear predictive (PLP) analysis of speechThe Journal of the Acoustical Society of America, 1990