Vector-to-Vector Regression via Distributional Loss for Speech Enhancement
- 8 January 2021
- journal article
- research article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Signal Processing Letters
- Vol. 28 (10709908), 254-258
- https://doi.org/10.1109/lsp.2021.3050386
Abstract
In this work, we leverage on a novel distributional loss to improve vector-to-vector regression for feature-based speech enhancement (SE). The distributional loss function is devised based on the Kullback-Leibler divergence between a selected target distribution and a conditional distribution to be learned from the data for each coefficient in the clean speech vector given the noisy input features. A deep model having a softmax layer per coefficient is employed to parametrize the conditional distribution, and deep model parameters are found by minimizing a weighted sum of the cross-entropy between its outputs and respective target distributions. Experiments with convolutional neural networks (CNNs) on publicly available noisy speech dataset obtained from Voice Bank corpus show consistent improvement over conventional solutions based on the mean squared error (MSE), and the least absolute deviation (LAD). Moreover, our approach compares favourably in terms of both speech quality and intelligibility against the Mixture Density Networks (MDNs), which is also an approach that relies on computing parametric conditional distributions based on Gaussian mixture models (GMMs) and a neural architecture.This publication has 41 references indexed in Scilit:
- Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASRPublished by Springer Science and Business Media LLC ,2015
- Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source SeparationIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015
- Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networksPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- A Regression Approach to Speech Enhancement Based on Deep Neural NetworksIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014
- On Training Targets for Supervised Speech SeparationIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014
- Wiener filtering based speech enhancement with Weighted Denoising Auto-encoder and noise classificationSpeech Communication, 2014
- The voice bank corpus: Design, collection and data analysis of a large regional accent speech databasePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Reducing the Dimensionality of Data with Neural NetworksScience, 2006
- A Fast Learning Algorithm for Deep Belief NetsNeural Computation, 2006
- Noise reduction using connectionist modelsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2003