Discriminatively trained recurrent neural networks for single-channel speech separation

Top Cited Papers

1 December 2014

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 577-581
https://doi.org/10.1109/globalsip.2014.7032183

Abstract

This paper describes an in-depth investigation of training criteria, network architectures and feature representations for regression-based single-channel speech separation with deep neural networks (DNNs). We use a generic discriminative training criterion corresponding to optimal source reconstruction from time-frequency masks, and introduce its application to speech separation in a reduced feature space (Mel domain). A comparative evaluation of time-frequency mask estimation by DNNs, recurrent DNNs and non-negative matrix factorization on the 2nd CHiME Speech Separation and Recognition Challenge shows consistent improvements by discriminative training, whereas long short-term memory recurrent DNNs obtain the overall best results. Furthermore, our results confirm the importance of fine-tuning the feature representation for DNN training.

Keywords

This publication has 19 references indexed in Scilit:

Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments
Computer Speech & Language, 2014
Mask-based enhancement for very low quality speech
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Coupled dictionary training for exemplar-based speech enhancement
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Deep learning for monaural speech separation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Speech recognition with deep recurrent neural networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Ideal ratio mask estimation using deep neural networks for robust speech recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Learning to Forget: Continual Prediction with LSTM
Neural Computation, 2000
Long Short-Term Memory
Neural Computation, 1997
Perceptual linear predictive (PLP) analysis of speech
The Journal of the Acoustical Society of America, 1990

Cited by 160 articles