A classification based approach to speech segregation
- 1 November 2012
- journal article
- research article
- Published by Acoustical Society of America (ASA) in The Journal of the Acoustical Society of America
- Vol. 132 (5), 3475-3483
- https://doi.org/10.1121/1.4754541
Abstract
A key problem in computational auditory scene analysis (CASA) is monaural speech segregation, which has proven to be very challenging. For monaural mixtures, one can only utilize the intrinsic properties of speech or interference to segregate target speech from background noise. Ideal binary mask (IBM) has been proposed as a main goal of sound segregation in CASA and has led to substantial improvements of human speech intelligibility in noise. This study proposes a classification approach to estimate the IBM and employs support vector machines to classify time-frequency units as either target- or interference-dominant. A re-thresholding method is incorporated to improve classification results and maximize hit minus false alarm rates. An auditory segmentation stage is utilized to further improve estimated masks. Systematic evaluations show that the proposed approach produces high quality estimated IBMs and outperforms a recent system in terms of classification accuracy.Keywords
This publication has 20 references indexed in Scilit:
- On strategies for imbalanced text classification using SVM: A comparative studyDecision Support Systems, 2009
- An algorithm that improves speech intelligibility in noise for normal-hearing listenersThe Journal of the Acoustical Society of America, 2009
- Speech intelligibility in background noise with ideal binary time-frequency maskingThe Journal of the Acoustical Society of America, 2009
- Segregation of unvoiced speech from nonspeech interferenceThe Journal of the Acoustical Society of America, 2008
- Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reductionThe Journal of the Acoustical Society of America, 2008
- Determination of the Potential Benefit of Time-Frequency Gain ManipulationEar & Hearing, 2006
- On Ideal Binary Mask As the Computational Goal of Auditory Scene AnalysisPublished by Springer Science and Business Media LLC ,2006
- Monaural Speech Segregation Based on Pitch Tracking and Amplitude ModulationIEEE Transactions on Neural Networks, 2004
- SNR estimation based on amplitude modulation analysis with applications to noise suppressionIEEE Transactions on Speech and Audio Processing, 2003
- Gender recognition from speech. Part I: Coarse analysisThe Journal of the Acoustical Society of America, 1991