Prosody modification using instants of significant excitation
- 18 April 2006
- journal article
- Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Audio, Speech, and Language Processing
- Vol. 14 (3), 972-980
- https://doi.org/10.1109/tsa.2005.858051
Abstract
Prosody modification involves changing the pitch and duration of speech without affecting the message and naturalness. This paper proposes a method for prosody (pitch and duration) modification using the instants of significant excitation of the vocal tract system during the production of speech. The instants of significant excitation correspond to the instants of glottal closure (epochs) in the case of voiced speech, and to some random excitations like onset of burst in the case of nonvoiced speech. Instants of significant excitation are computed from the linear prediction (LP) residual of speech signals by using the property of average group-delay of minimum phase signals. The modification of pitch and duration is achieved by manipulating the LP residual with the help of the knowledge of the instants of significant excitation. The modified residual is used to excite the time-varying filter, whose parameters are derived from the original speech signal. Perceptual quality of the synthesized speech is good and is without any significant distortion. The proposed method is evaluated using waveforms, spectrograms, and listening tests. The performance of the method is compared with linear prediction pitch synchronous overlap and add (LP-PSOLA) method, which is another method for prosody manipulation based on the modification of the LP residual. The original and the synthesized speech signals obtained by the proposed method and by the LP-PSOLA method are available for listening at http://speech.cs.iitm.ernet.in/Main/result/prosody.html.Keywords
This publication has 20 references indexed in Scilit:
- High quality time-scale modification for speechPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2005
- Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisitedPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Applying the harmonic plus noise model in concatenative speech synthesisIEEE Transactions on Speech and Audio Processing, 2001
- Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in soundsSpeech Communication, 1999
- Robustness of group-delay-based method for extraction of significant instants of excitation from speech signalsIEEE Transactions on Speech and Audio Processing, 1999
- An iterative algorithm for decomposition of speech signals into periodic and aperiodic componentsIEEE Transactions on Speech and Audio Processing, 1998
- Non-parametric techniques for pitch-scale and time-scale modification of speechSpeech Communication, 1995
- Determination of instants of significant excitation in speech using group delay functionIEEE Transactions on Speech and Audio Processing, 1995
- HNS: Speech modification based on a harmonic+noise modelPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1993
- A weighted overlap-add method of short-time Fourier analysis/SynthesisIEEE Transactions on Acoustics, Speech, and Signal Processing, 1980