Advances in optimizing recurrent networks

Top Cited Papers

1 May 2013

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 8624-8628
https://doi.org/10.1109/icassp.2013.6639349

Abstract

After a more than decade-long period of relatively little research activity in the area of recurrent neural networks, several new developments will be reviewed here that have allowed substantial progress both in understanding and in technical solutions towards more efficient training of recurrent networks. These advances have been motivated by and related to the optimization issues surrounding deep learning. Although recurrent networks are extremely powerful in what they can in principle represent in terms of modeling sequences, their training is plagued by two aspects of the same issue regarding the learning of long-term dependencies. Experiments reported here evaluate the use of clipping gradients, spanning longer time ranges with leaky integration, advanced momentum techniques, using more powerful output probability models, and encouraging sparser gradients to help symmetry breaking and credit assignment. The experiments are performed on text and music data and show off the combined effects of these techniques in generally improving both training and test error.

Keywords

Other Versions

Version 2, 2012-12-05, preprints

This publication has 8 references indexed in Scilit:

Extensions of recurrent neural network language model
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2011
Temporal-Kernel Recurrent Neural Networks
Neural Networks, 2010
Learning Deep Architectures for AI
Foundations and Trends® in Machine Learning, 2008
Optimization and applications of echo state networks with leaky- integrator neurons
Neural Networks, 2007
A Fast Learning Algorithm for Deep Belief Nets
Neural Computation, 2006
Long Short-Term Memory
Neural Computation, 1997
Learning long-term dependencies with gradient descent is difficult
IEEE Transactions on Neural Networks, 1994
Learning representations by back-propagating errors
Nature, 1986

Cited by 211 articles