Coherent modification of pitch and energy for expressive prosody implantation
- 1 April 2015
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- p. 4914-4918
- https://doi.org/10.1109/icassp.2015.7178905
Abstract
In expressive TTS and voice transformation systems, implantation of expressive prosody derived from external out-of-domain sources often leads to extreme pitch modification that compromises the naturalness of the synthesized speech. In this work we investigate and prove a hypothesis that the naturalness loss is in part attributed to a violation of a fundamental relationship between the instantaneous pitch frequency and instantaneous energy of a speech signal. We propose an enhancement for pitch modification where the instantaneous energy is modified coherently with the pitch frequency and demonstrate the potential of this method in a subjective listening evaluation. The proposed approach is complementary to and can be combined with spectrum shape transformation methods for achieving the maximal possible quality of pitch modification.Keywords
This publication has 8 references indexed in Scilit:
- Uniform speech parameterization for multi-form segment synthesisPublished by International Speech Communication Association ,2011
- Sinusoidal model parameterization for HMM-based TTS systemPublished by International Speech Communication Association ,2010
- Synthesis by generation and concatenation of multiform segmentsPublished by International Speech Communication Association ,2008
- On the correlation between energy and pitch accent in read English speechPublished by International Speech Communication Association ,2006
- High Quality Sinusoidal Modeling of Wideband Speech for the Purposes of Speech Synthesis and ModificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2006
- The ETSI extended distributed speech recognition (DSR) standards: client side processing and tonal language recognition evaluationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2004
- Stochastic modeling of spectral adjustment for high quality pitch modificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Relationship between changes in voice pitch and loudnessJournal of Voice, 1988