Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis

Top Cited Papers

1 April 2015

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

No. 15206149,p. 4470-4474
https://doi.org/10.1109/icassp.2015.7178816

Abstract

Long short-term memory recurrent neural networks (LSTM-RNNs) have been applied to various speech applications including acoustic modeling for statistical parametric speech synthesis. One of the concerns for applying them to text-to-speech applications is its effect on latency. To address this concern, this paper proposes a low-latency, streaming speech synthesis architecture using unidirectional LSTM-RNNs with a recurrent output layer. The use of unidirectional RNN architecture allows frame-synchronous streaming inference of output acoustic features given input linguistic features. The recurrent output layer further encourages smooth transition between acoustic features at consecutive frames. Experimental results in subjective listening tests show that the proposed architecture can synthesize natural sounding speech without requiring utterance-level batch processing.

Keywords

This publication has 25 references indexed in Scilit:

Vocaine the vocoder and applications in speech synthesis
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Low latency parameter generation for real-time speech synthesis system
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Fast, low-artifact speech synthesis considering global variance
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
On rectified linear units for speech processing
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis
Speech Communication, 2011
Statistical parametric speech synthesis
Speech Communication, 2009
A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis
IEICE Transactions on Information and Systems, 2007
Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005
IEICE Transactions on Information and Systems, 2007
Long Short-Term Memory
Neural Computation, 1997
An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories
Neural Computation, 1990

Cited by 124 articles