Multi-output RNN-LSTM for multiple speaker speech synthesis and adaptation

Abstract

Deep Learning has been applied successfully to speech processing. In this paper we propose an architecture for speech synthesis using multiple speakers. Some hidden layers are shared by all the speakers, while there is a specific output layer for each speaker. Objective and perceptual experiments prove that this scheme produces much better results in comparison with single speaker model. Moreover, we also tackle the problem of speaker adaptation by adding a new output branch to the model and successfully training it without the need of modifying the base optimized model. This fine tuning method achieves better results than training the new speaker from scratch with its own model.

Keywords

This publication has 12 references indexed in Scilit:

Investigating gated recurrent networks for speech synthesis
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Deep Learning: Methods and Applications
Foundations and Trends® in Signal Processing, 2014
F0 contour prediction with a deep belief network-Gaussian process hybrid model
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Mel-cepstral distance measure for objective speech quality assessment
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2002
The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 1998
Long Short-Term Memory
Neural Computation, 1997
Unit selection in a concatenative speech synthesis system using a large speech database
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1996

Cited by 8 articles