Deep neural networks for single channel source separation

1 May 2014

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 3734-3738
https://doi.org/10.1109/icassp.2014.6854299

Abstract

In this paper, a novel approach for single channel source separation (SCSS) using a deep neural network (DNN) architecture is introduced. Unlike previous studies in which DNN and other classifiers were used for classifying time-frequency bins to obtain hard masks for each source, we use the DNN to classify estimated source spectra to check for their validity during separation. In the training stage, the training data for the source signals are used to train a DNN. In the separation stage, the trained DNN is utilized to aid in estimation of each source in the mixed signal. Single channel source separation problem is formulated as an energy minimization problem where each source spectra estimate is encouraged to fit the trained DNN model and the mixed signal spectrum is encouraged to be written as a weighted sum of the estimated source spectra. The proposed approach works regardless of the energy scale differences between the source signals in the training and separation stages. Nonnegative matrix factorization (NMF) is used to initialize the DNN estimate for each source. The experimental results show that using DNN initialized by NMF for source separation improves the quality of the separated signal compared with using NMF for source separation.

Keywords

This publication has 16 references indexed in Scilit:

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
IEEE Signal Processing Magazine, 2012
Regularized nonnegative matrix factorization using Gaussian mixture priors for supervised single channel source separation
Computer Speech & Language, 2012
Single channel speech music separation using nonnegative matrix factorization and spectral masks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Super-human multi-talker speech recognition: A graphical modeling approach
Computer Speech & Language, 2010
Scaled factorial hidden Markov models: A new technique for compensating gain differences in model-based single channel speech separation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2010
Factorial Scaled Hidden Markov Model for polyphonic audio representation and source separation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2009
Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis
Neural Computation, 2009
Soft Mask Methods for Single-Channel Speaker Separation
IEEE Transactions on Audio, Speech, and Language Processing, 2007
Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria
IEEE Transactions on Audio, Speech, and Language Processing, 2007
A Fast Learning Algorithm for Deep Belief Nets
Neural Computation, 2006

Cited by 58 articles