Separation of Singing Voice From Music Accompaniment for Monaural Recordings

23 April 2007

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Audio, Speech, and Language Processing

Vol. 15 (4), 1475-1487
https://doi.org/10.1109/tasl.2006.889789

Abstract

Separating singing voice from music accompaniment is very useful in many applications, such as lyrics recognition and alignment, singer identification, and music information retrieval. Although speech separation has been extensively studied for decades, singing voice separation has been little investigated. We propose a system to separate singing voice from music accompaniment for monaural recordings. Our system consists of three stages. The singing voice detection stage partitions and classifies an input into vocal and nonvocal portions. For vocal portions, the predominant pitch detection stage detects the pitch of the singing voice and then the separation stage uses the detected pitch to group the time-frequency segments of the singing voice. Quantitative results show that the system performs the separation task successfully

Keywords

This publication has 20 references indexed in Scilit:

Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation
IEEE Transactions on Neural Networks, 2004
A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signals
Speech Communication, 2004
Multiple fundamental frequency estimation based on harmonicity and spectral smoothness
IEEE Transactions on Speech and Audio Processing, 2003
A multipitch tracking algorithm for noisy speech
IEEE Transactions on Speech and Audio Processing, 2003
Musical genre classification of audio signals
IEEE Transactions on Speech and Audio Processing, 2002
Idiot's Bayes—Not So Stupid After All?
International Statistical Review, 2001
Classification of general audio data for content-based retrieval
Pattern Recognition Letters, 2001
Separation of speech from interfering sounds based on oscillatory correlation
IEEE Transactions on Neural Networks, 1999
A blackboard architecture for computational auditory scene analysis
Speech Communication, 1999
The Acoustics of the Singing Voice
Scientific American, 1977

Cited by 100 articles