A Regression Approach to Speech Enhancement Based on Deep Neural Networks
- 21 October 2014
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE/ACM Transactions on Audio, Speech, and Language Processing
- Vol. 23 (1), 7-19
- https://doi.org/10.1109/taslp.2014.2364452
Abstract
In contrast to the conventional minimum mean square error (MMSE)-based noise reduction techniques, we propose a supervised method to enhance speech by means of finding a mapping function between noisy and clean speech signals based on deep neural networks (DNNs). In order to be able to handle a wide range of additive noises in real-world situations, a large training set that encompasses many possible combinations of speech and noise types, is first designed. A DNN architecture is then employed as a nonlinear regression function to ensure a powerful modeling capability. Several techniques have also been proposed to improve the DNN-based speech enhancement system, including global variance equalization to alleviate the over-smoothing problem of the regression model, and the dropout and noise-aware training strategies to further improve the generalization capability of DNNs to unseen noise conditions. Experimental results demonstrate that the proposed framework can achieve significant improvements in both objective and subjective measures over the conventional MMSE based technique. It is also interesting to observe that the proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general. Furthermore, the resulting DNN model, trained with artificial synthesized data, is also effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.Keywords
Funding Information
- National Natural Science Foundation of China (61273264, 61305002)
- National 973 Program of China (2012CB326405)
This publication has 38 references indexed in Scilit:
- Learning Deep Architectures for AIFoundations and Trends® in Machine Learning, 2009
- Reducing the Dimensionality of Data with Neural NetworksScience, 2006
- A Fast Learning Algorithm for Deep Belief NetsNeural Computation, 2006
- Noise spectrum estimation in adverse environments: improved minima controlled recursive averagingIEEE Transactions on Speech and Audio Processing, 2003
- SNR estimation based on amplitude modulation analysis with applications to noise suppressionIEEE Transactions on Speech and Audio Processing, 2003
- Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressorIEEE Transactions on Speech and Audio Processing, 1994
- Speech enhancement using a minimum mean-square error log-spectral amplitude estimatorIEEE Transactions on Acoustics, Speech, and Signal Processing, 1985
- Speech enhancement using a minimum-mean square error short-time spectral amplitude estimatorIEEE Transactions on Acoustics, Speech, and Signal Processing, 1984
- Suppression of acoustic noise in speech using spectral subtractionIEEE Transactions on Acoustics, Speech, and Signal Processing, 1979
- The Design of Optimum Multifactorial ExperimentsBiometrika, 1946