Statistical parametric speech synthesis using deep neural networks
- 1 May 2013
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 7962-7966
- https://doi.org/10.1109/icassp.2013.6639215
Abstract
Conventional approaches to statistical parametric speech synthesis typically use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech parameters given texts. Speech parameters are generated from the probability densities to maximize their output probabilities, then a speech waveform is reconstructed from the generated parameters. This approach is reasonably effective but has a couple of limitations, e.g. decision trees are inefficient to model complex context dependencies. This paper examines an alternative scheme that is based on a deep neural network (DNN). The relationship between input texts and their acoustic realizations is modeled by a DNN. The use of the DNN can address some limitations of the conventional approach. Experimental results show that the DNN-based systems outperformed the HMM-based systems with similar numbers of parameters.Keywords
This publication has 28 references indexed in Scilit:
- Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesisSpeech Communication, 2011
- Statistical parametric speech synthesisSpeech Communication, 2009
- Learning Deep Architectures for AIFoundations and Trends® in Machine Learning, 2009
- A Style Control Technique for HMM-Based Expressive Speech SynthesisIEICE Transactions on Information and Systems, 2007
- A Hidden Semi-Markov Model-Based Speech Synthesis SystemIEICE Transactions on Information and Systems, 2007
- A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech SynthesisIEICE Transactions on Information and Systems, 2007
- Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005IEICE Transactions on Information and Systems, 2007
- Speech parameter generation algorithms for HMM-based speech synthesisPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Speaker interpolation for HMM-based speech synthesis system.Acoustical Science and Technology, 2000
- Speaker-independent isolated word recognition using dynamic features of speech spectrumIEEE Transactions on Acoustics, Speech, and Signal Processing, 1986