Multi-output RNN-LSTM for multiple speaker speech synthesis and adaptation
- 1 August 2016
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE) in 2016 24th European Signal Processing Conference (EUSIPCO)
- p. 2325-2329
- https://doi.org/10.1109/eusipco.2016.7760664
Abstract
Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a specific output layer for each speaker. Objective and perceptual experiments prove that this scheme produces much better results in comparison with single speaker model. Moreover, we also tackle the problem of speaker adaptation by adding a new output branch to the model and successfully training it without the need of modifying the base optimized model. This fine tuning method achieves better results than training the new speaker from scratch with its own model.Keywords
This publication has 12 references indexed in Scilit:
- Investigating gated recurrent networks for speech synthesisPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2016
- Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesisPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesisPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesisPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Deep Learning: Methods and ApplicationsFoundations and Trends® in Signal Processing, 2014
- F0 contour prediction with a deep belief network-Gaussian process hybrid modelPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Mel-cepstral distance measure for objective speech quality assessmentPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2002
- The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem SolutionsInternational Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 1998
- Long Short-Term MemoryNeural Computation, 1997
- Unit selection in a concatenative speech synthesis system using a large speech databasePublished by Institute of Electrical and Electronics Engineers (IEEE) ,1996