Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation
Top Cited Papers
- 13 August 2015
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE/ACM Transactions on Audio, Speech, and Language Processing
- Vol. 23 (12), 2136-2147
- https://doi.org/10.1109/taslp.2015.2468583
Abstract
Monaural source separation is important for many real world applications. It is challenging because, with only a single channel of information available, without any constraints, an infinite number of solutions are possible. In this paper, we explore joint optimization of masking functions and deep recurrent neural networks for monaural source separation tasks, including speech separation, singing voice separation, and speech denoising. The joint optimization of the deep recurrent neural networks with an extra masking layer enforces a reconstruction constraint. Moreover, we explore a discriminative criterion for training neural networks to further enhance the separation performance. We evaluate the proposed system on the TSP, MIR-1K, and TIMIT datasets for speech separation, singing voice separation, and speech denoising tasks, respectively. Our approaches achieve 2.30-4.98 dB SDR gain compared to NMF models in the speech separation task, 2.30-2.48 dB GNSDR gain and 4.32-5.42 dB GSIR gain compared to existing models in the singing voice separation task, and outperform NMF and DNN baselines in the speech denoising task.Keywords
Other Versions
Funding Information
- Army Research Office (W911NF-09-1-0383)
This publication has 27 references indexed in Scilit:
- Source separation with scattering Non-Negative Matrix FactorizationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Discriminatively trained recurrent neural networks for single-channel speech separationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Deep learning for monaural speech separationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMMPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy SpeechIEEE Transactions on Audio, Speech, and Language Processing, 2011
- Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular SongsIEEE Transactions on Audio, Speech, and Language Processing, 2007
- Performance measurement in blind audio source separationIEEE Transactions on Audio, Speech, and Language Processing, 2006
- Blind Separation of Speech Mixtures via Time-Frequency MaskingIEEE Transactions on Signal Processing, 2004
- A Limited Memory Algorithm for Bound Constrained OptimizationSIAM Journal on Scientific Computing, 1995
- Backpropagation through time: what it does and how to do itProceedings of the IEEE, 1990