Investigating gated recurrent networks for speech synthesis
- 1 March 2016
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 5140-5144
- https://doi.org/10.1109/icassp.2016.7472657
Abstract
Recently, recurrent neural networks (RNNs) as powerful sequence models have re-emerged as a potential acoustic model for statistical parametric speech synthesis (SPSS). The long short-term memory (LSTM) architecture is particularly attractive because it addresses the vanishing gradient problem in standard RNNs, making them easier to train. Although recent studies have demonstrated that LSTMs can achieve significantly better performance on SPSS than deep feedforward neural networks, little is known about why. Here we attempt to answer two questions: a) why do LSTMs work well as a sequence model for SPSS; b) which component (e.g., input gate, output gate, forget gate) is most important. We present a visual analysis alongside a series of experiments, resulting in a proposal for a simplified architecture. The simplified architecture has significantly fewer parameters than an LSTM, thus reducing generation complexity considerably without degrading quality.Keywords
This publication has 16 references indexed in Scilit:
- Modelling acoustic feature dependencies with artificial neural networks: Trajectory-RNADEPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesisPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesisPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Deep mixture density networks for acoustic modeling in statistical parametric speech synthesisPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesisPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Statistical parametric speech synthesisSpeech Communication, 2009
- Speech parameter generation algorithms for HMM-based speech synthesisPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in soundsSpeech Communication, 1999
- An RNN-based prosodic information synthesizer for Mandarin text-to-speechIEEE Transactions on Speech and Audio Processing, 1998
- Unit selection in a concatenative speech synthesis system using a large speech databasePublished by Institute of Electrical and Electronics Engineers (IEEE) ,1996