LSTM: A Search Space Odyssey

Top Cited Papers

8 July 2016

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Neural Networks and Learning Systems

Vol. 28 (10), 2222-2232
https://doi.org/10.1109/tnnls.2016.2582924

Abstract

Several variants of the long short-term memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful functional ANalysis Of VAriance framework. In total, we summarize the results of 5400 experimental runs ( $\approx 15$ years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

Keywords

Funding Information

Swiss National Science Foundation through the Project Theory and Practice of Reinforcement Learning 2, and through the Project Advanced Reinforcement Learning (138219, 156682)
European Institute of Innovation and Technology through the Project Neural Dynamics, through the Project NASCENCE and through the Project WAY (FP7-ICT-270247, FP7-ICT-317662, FP7-ICT-288551)

This publication has 16 references indexed in Scilit:

Multi-resolution linear prediction based features for audio onset detection with bidirectional LSTM neural networks
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
Published by Association for Computational Linguistics (ACL) ,2014
Generalized Functional ANOVA Diagnostics for High-Dimensional Functions of Dependent Variables
Journal of Computational and Graphical Statistics, 2007
Training Recurrent Networks by Evolino
Neural Computation, 2007
Connectionist temporal classification
Published by Association for Computing Machinery (ACM) ,2006
Framewise phoneme classification with bidirectional LSTM and other neural network architectures
Neural Networks, 2005
Long Short-Term Memory
Neural Computation, 1997
Generalization of backpropagation with application to a recurrent gas market model
Neural Networks, 1988
Minimization by Random Search Techniques
Mathematics of Operations Research, 1981
Recent Advances in Finding Best Operating Conditions
Journal of the American Statistical Association, 1953

Cited by 3688 articles