Deep neural networks for single channel source separation
- 1 May 2014
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 3734-3738
- https://doi.org/10.1109/icassp.2014.6854299
Abstract
In this paper, a novel approach for single channel source separation (SCSS) using a deep neural network (DNN) architecture is introduced. Unlike previous studies in which DNN and other classifiers were used for classifying time-frequency bins to obtain hard masks for each source, we use the DNN to classify estimated source spectra to check for their validity during separation. In the training stage, the training data for the source signals are used to train a DNN. In the separation stage, the trained DNN is utilized to aid in estimation of each source in the mixed signal. Single channel source separation problem is formulated as an energy minimization problem where each source spectra estimate is encouraged to fit the trained DNN model and the mixed signal spectrum is encouraged to be written as a weighted sum of the estimated source spectra. The proposed approach works regardless of the energy scale differences between the source signals in the training and separation stages. Nonnegative matrix factorization (NMF) is used to initialize the DNN estimate for each source. The experimental results show that using DNN initialized by NMF for source separation improves the quality of the separated signal compared with using NMF for source separation.Keywords
This publication has 16 references indexed in Scilit:
- Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research GroupsIEEE Signal Processing Magazine, 2012
- Regularized nonnegative matrix factorization using Gaussian mixture priors for supervised single channel source separationComputer Speech & Language, 2012
- Single channel speech music separation using nonnegative matrix factorization and spectral masksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- Super-human multi-talker speech recognition: A graphical modeling approachComputer Speech & Language, 2010
- Scaled factorial hidden Markov models: A new technique for compensating gain differences in model-based single channel speech separationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- Factorial Scaled Hidden Markov Model for polyphonic audio representation and source separationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music AnalysisNeural Computation, 2009
- Soft Mask Methods for Single-Channel Speaker SeparationIEEE Transactions on Audio, Speech, and Language Processing, 2007
- Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness CriteriaIEEE Transactions on Audio, Speech, and Language Processing, 2007
- A Fast Learning Algorithm for Deep Belief NetsNeural Computation, 2006