Separation of Singing Voice From Music Accompaniment for Monaural Recordings
- 23 April 2007
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Audio, Speech, and Language Processing
- Vol. 15 (4), 1475-1487
- https://doi.org/10.1109/tasl.2006.889789
Abstract
Separating singing voice from music accompaniment is very useful in many applications, such as lyrics recognition and alignment, singer identification, and music information retrieval. Although speech separation has been extensively studied for decades, singing voice separation has been little investigated. We propose a system to separate singing voice from music accompaniment for monaural recordings. Our system consists of three stages. The singing voice detection stage partitions and classifies an input into vocal and nonvocal portions. For vocal portions, the predominant pitch detection stage detects the pitch of the singing voice and then the separation stage uses the detected pitch to group the time-frequency segments of the singing voice. Quantitative results show that the system performs the separation task successfullyKeywords
This publication has 20 references indexed in Scilit:
- Monaural Speech Segregation Based on Pitch Tracking and Amplitude ModulationIEEE Transactions on Neural Networks, 2004
- A real-time music-scene-description system: predominant-F0 estimation for detecting melody and bass lines in real-world audio signalsSpeech Communication, 2004
- Multiple fundamental frequency estimation based on harmonicity and spectral smoothnessIEEE Transactions on Speech and Audio Processing, 2003
- A multipitch tracking algorithm for noisy speechIEEE Transactions on Speech and Audio Processing, 2003
- Musical genre classification of audio signalsIEEE Transactions on Speech and Audio Processing, 2002
- Idiot's Bayes—Not So Stupid After All?International Statistical Review, 2001
- Classification of general audio data for content-based retrievalPattern Recognition Letters, 2001
- Separation of speech from interfering sounds based on oscillatory correlationIEEE Transactions on Neural Networks, 1999
- A blackboard architecture for computational auditory scene analysisSpeech Communication, 1999
- The Acoustics of the Singing VoiceScientific American, 1977