Abstract
We describe a method to estimate the power spectral density of nonstationary noise when a noisy speech signal is given. The method can be combined with any speech enhancement algorithm which requires a noise power spectral density estimate. In contrast to other methods, our approach does not use a voice activity detector. Instead it tracks spectral minima in each frequency band without any distinction between speech activity and speech pause. By minimizing a conditional mean square estimation error criterion in each time step we derive the optimal smoothing parameter for recursive smoothing of the power spectral density of the noisy speech signal. Based on the optimally smoothed power spectral density estimate and the analysis of the statistics of spectral minima an unbiased noise estimator is developed. The estimator is well suited for real time implementations. Furthermore, to improve the performance in nonstationary noise we introduce a method to speed up the tracking of the spectral minima. Finally, we evaluate the proposed method in the context of speech enhancement and low bit rate speech coding with various noise types.

This publication has 11 references indexed in Scilit: