Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
Top Cited Papers
- 1 April 2015
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- No. 15206149,p. 4470-4474
- https://doi.org/10.1109/icassp.2015.7178816
Abstract
Long short-term memory recurrent neural networks (LSTM-RNNs) have been applied to various speech applications including acoustic modeling for statistical parametric speech synthesis. One of the concerns for applying them to text-to-speech applications is its effect on latency. To address this concern, this paper proposes a low-latency, streaming speech synthesis architecture using unidirectional LSTM-RNNs with a recurrent output layer. The use of unidirectional RNN architecture allows frame-synchronous streaming inference of output acoustic features given input linguistic features. The recurrent output layer further encourages smooth transition between acoustic features at consecutive frames. Experimental results in subjective listening tests show that the proposed architecture can synthesize natural sounding speech without requiring utterance-level batch processing.Keywords
This publication has 25 references indexed in Scilit:
- Vocaine the vocoder and applications in speech synthesisPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Low latency parameter generation for real-time speech synthesis systemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Fast, low-artifact speech synthesis considering global variancePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- On rectified linear units for speech processingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesisSpeech Communication, 2011
- Statistical parametric speech synthesisSpeech Communication, 2009
- A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech SynthesisIEICE Transactions on Information and Systems, 2007
- Details of the Nitech HMM-Based Speech Synthesis System for the Blizzard Challenge 2005IEICE Transactions on Information and Systems, 2007
- Long Short-Term MemoryNeural Computation, 1997
- An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network TrajectoriesNeural Computation, 1990