Mask-based enhancement for very low quality speech

Abstract
We propose a mask-based enhancer for very low quality speech that is able to preserve important cues in a noise-robust manner by identifying the time-frequency regions that contain significant speech energy. We use a classifier to estimate a time-frequency mask from an input feature set that provides information about the energy distribution of both voiced and unvoiced speech. We evaluate the enhancer on a range of noisy speech signals and demonstrate that it yields consistent improvements in an objective intelligibility measure.

This publication has 21 references indexed in Scilit: