Statistical parametric speech synthesis using deep neural networks

1 May 2013

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 7962-7966
https://doi.org/10.1109/icassp.2013.6639215

Abstract

Conventional approaches to statistical parametric speech synthesis typically use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech parameters given texts. Speech parameters are generated from the probability densities to maximize their output probabilities, then a speech waveform is reconstructed from the generated parameters. This approach is reasonably effective but has a couple of limitations, e.g. decision trees are inefficient to model complex context dependencies. This paper examines an alternative scheme that is based on a deep neural network (DNN). The relationship between input texts and their acoustic realizations is modeled by a DNN. The use of the DNN can address some limitations of the conventional approach. Experimental results show that the DNN-based systems outperformed the HMM-based systems with similar numbers of parameters.

Keywords

This publication has 28 references indexed in Scilit:

Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis
Speech Communication, 2011
Statistical parametric speech synthesis
Speech Communication, 2009
Learning Deep Architectures for AI
Foundations and Trends® in Machine Learning, 2009
A Style Control Technique for HMM-Based Expressive Speech Synthesis
IEICE Transactions on Information and Systems, 2007
A Hidden Semi-Markov Model-Based Speech Synthesis System
IEICE Transactions on Information and Systems, 2007
A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis
IEICE Transactions on Information and Systems, 2007
Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005
IEICE Transactions on Information and Systems, 2007
Speech parameter generation algorithms for HMM-based speech synthesis
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
Speaker interpolation for HMM-based speech synthesis system.
Acoustical Science and Technology, 2000
Speaker-independent isolated word recognition using dynamic features of speech spectrum
IEEE Transactions on Acoustics, Speech, and Signal Processing, 1986

Cited by 209 articles