Time-Scale Modification of Audio Signals Using Enhanced WSOLA With Management of Transients
- 18 December 2007
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Audio, Speech, and Language Processing
- Vol. 16 (1), 106-115
- https://doi.org/10.1109/tasl.2007.909444
Abstract
In this paper, we present an algorithm for time-scale modification of music signals, based on the waveform similarity overlap-and-add technique (WSOLA). A well-known disadvantage of the standard WSOLA is the uniform time-scaling of the entire signal, including the perceptually significant transient sections (PSTs), where temporal envelope changes as well as significant spectral transitions occur. Time-scaling of PSTs can severely degrade the music quality. We address this problem by detecting the PSTs and leaving them intact, while time-scaling the remainder of the signal, which is relatively steady-state. In the proposed algorithm, the PSTs are detected using a Mel frequency cepstrum nonstationarity measure and the normalized cross-correlation, with time-varying threshold functions. Our study shows that the accurate detection of PSTs within the WSOLA framework makes it possible to achieve a higher quality of time-scaled music, as confirmed by subjective listening tests.Keywords
This publication has 27 references indexed in Scilit:
- Content-based TransformationsJournal of New Music Research, 2003
- Improved phase vocoder time-scale modification of audioIEEE Transactions on Speech and Audio Processing, 1999
- Continuous probabilistic transform for voice conversionIEEE Transactions on Speech and Audio Processing, 1998
- Speaker recognition: A tutorialProceedings of the IEEE, 1997
- A portable digital speech-rate converter for hearing impairmentIEEE Transactions on Rehabilitation Engineering, 1996
- Time-scale modification of speech based on short-time Fourier analysisIEEE Transactions on Acoustics, Speech, and Signal Processing, 1981
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentencesIEEE Transactions on Acoustics, Speech, and Signal Processing, 1980
- Time-domain algorithms for harmonic bandwidth reduction and time scaling of speech signalsIEEE Transactions on Acoustics, Speech, and Signal Processing, 1979
- Phase VocoderThe Journal of the Acoustical Society of America, 1965
- The Relation of Pitch to Frequency: A Revised ScaleThe American Journal of Psychology, 1940