A classification based approach to speech segregation

1 November 2012

journal article
research article
Published by Acoustical Society of America (ASA) in The Journal of the Acoustical Society of America

Vol. 132 (5), 3475-3483
https://doi.org/10.1121/1.4754541

Abstract

A key problem in computational auditory scene analysis (CASA) is monaural speech segregation, which has proven to be very challenging. For monaural mixtures, one can only utilize the intrinsic properties of speech or interference to segregate target speech from background noise. Ideal binary mask (IBM) has been proposed as a main goal of sound segregation in CASA and has led to substantial improvements of human speech intelligibility in noise. This study proposes a classification approach to estimate the IBM and employs support vector machines to classify time-frequency units as either target- or interference-dominant. A re-thresholding method is incorporated to improve classification results and maximize hit minus false alarm rates. An auditory segmentation stage is utilized to further improve estimated masks. Systematic evaluations show that the proposed approach produces high quality estimated IBMs and outperforms a recent system in terms of classification accuracy.

Keywords

This publication has 20 references indexed in Scilit:

On strategies for imbalanced text classification using SVM: A comparative study
Decision Support Systems, 2009
An algorithm that improves speech intelligibility in noise for normal-hearing listeners
The Journal of the Acoustical Society of America, 2009
Speech intelligibility in background noise with ideal binary time-frequency masking
The Journal of the Acoustical Society of America, 2009
Segregation of unvoiced speech from nonspeech interference
The Journal of the Acoustical Society of America, 2008
Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction
The Journal of the Acoustical Society of America, 2008
Determination of the Potential Benefit of Time-Frequency Gain Manipulation
Ear & Hearing, 2006
On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis
Published by Springer Science and Business Media LLC ,2006
Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation
IEEE Transactions on Neural Networks, 2004
SNR estimation based on amplitude modulation analysis with applications to noise suppression
IEEE Transactions on Speech and Audio Processing, 2003
Gender recognition from speech. Part I: Coarse analysis
The Journal of the Acoustical Society of America, 1991

Cited by 74 articles