Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction

1 March 2008

journal article
research article
Published by Acoustical Society of America (ASA) in The Journal of the Acoustical Society of America

Vol. 123 (3), 1673-1682
https://doi.org/10.1121/1.2832617

Abstract

The application of the ideal binary mask to an auditory mixture has been shown to yield substantial improvements in intelligibility. This mask is commonly applied to the time–frequency

(T - F)

representation of a mixture signal and eliminates portions of a signal below a signal-to-noise-ratio (SNR) threshold while allowing others to pass through intact. The factors influencing intelligibility of ideal binary-masked speech are not well understood and are examined in the present study. Specifically, the effects of the local SNR threshold, input SNR level, masker type, and errors introduced in estimating the ideal mask are examined. Consistent with previous studies, intelligibility of binary-masked stimuli is quite high even at

- 10 dB

SNR for all maskers tested. Performance was affected the most when the masker dominated

T - F

units were wrongly labeled as target-dominated

T - F

units. Performance plateaued near 100% correct for SNR thresholds ranging from

- 20 to 5 dB

. The existence of the plateau region suggests that it is the pattern of the ideal binary mask that matters the most rather than the local SNR of each

T - F

unit. This pattern directs the listener’s attention to where the target is and enables them to segregate speech effectively in multitalker environments.

Keywords

This publication has 27 references indexed in Scilit:

Subjective comparison and evaluation of speech enhancement algorithms
Speech Communication, 2007
Factors influencing glimpsing of speech in noise
The Journal of the Acoustical Society of America, 2007
Visually-guided Attention Enhances Target Identification in a Complex Auditory Scene
Journal of the Association for Research in Otolaryngology, 2007
Determination of the Potential Benefit of Time-Frequency Gain Manipulation
Ear & Hearing, 2006
A noise-estimation algorithm for highly non-stationary environments
Speech Communication, 2006
Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation
IEEE Transactions on Neural Networks, 2004
A model of auditory streaming
The Journal of the Acoustical Society of America, 1997
Primitive auditory segregation based on oscillatory correlation
Cognitive Science, 1996
Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing
The Journal of the Acoustical Society of America, 1990
Speech enhancement using a minimum mean-square error log-spectral amplitude estimator
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1985

Cited by 185 articles