Vector-to-Vector Regression via Distributional Loss for Speech Enhancement

8 January 2021

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Signal Processing Letters

Vol. 28 (10709908), 254-258
https://doi.org/10.1109/lsp.2021.3050386

Abstract

In this work, we leverage on a novel distributional loss to improve vector-to-vector regression for feature-based speech enhancement (SE). The distributional loss function is devised based on the Kullback-Leibler divergence between a selected target distribution and a conditional distribution to be learned from the data for each coefficient in the clean speech vector given the noisy input features. A deep model having a softmax layer per coefficient is employed to parametrize the conditional distribution, and deep model parameters are found by minimizing a weighted sum of the cross-entropy between its outputs and respective target distributions. Experiments with convolutional neural networks (CNNs) on publicly available noisy speech dataset obtained from Voice Bank corpus show consistent improvement over conventional solutions based on the mean squared error (MSE), and the least absolute deviation (LAD). Moreover, our approach compares favourably in terms of both speech quality and intelligibility against the Mixture Density Networks (MDNs), which is also an approach that relies on computing parametric conditional distributions based on Gaussian mixture models (GMMs) and a neural architecture.

This publication has 41 references indexed in Scilit:

Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR
Published by Springer Science and Business Media LLC ,2015
Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015
Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
A Regression Approach to Speech Enhancement Based on Deep Neural Networks
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014
On Training Targets for Supervised Speech Separation
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014
Wiener filtering based speech enhancement with Weighted Denoising Auto-encoder and noise classification
Speech Communication, 2014
The voice bank corpus: Design, collection and data analysis of a large regional accent speech database
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Reducing the Dimensionality of Data with Neural Networks
Science, 2006
A Fast Learning Algorithm for Deep Belief Nets
Neural Computation, 2006
Noise reduction using connectionist models
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2003

Cited by 5 articles