Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering

1 March 2016

conference paper
conference paper
Published by Institute of Electrical and Electronics Engineers (IEEE)

p. 5880-5884
https://doi.org/10.1109/icassp.2016.7472805

Abstract

We present a new approach to scalable training of deep learning machines by incremental block training with intra-block parallel optimization to leverage data parallelism and blockwise model-update filtering to stabilize learning process. By using an implementation on a distributed GPU cluster with an MPI-based HPC machine learning framework to coordinate parallel job scheduling and collective communication, we have trained successfully deep bidirectional long short-term memory (LSTM) recurrent neural networks (RNNs) and fully-connected feed-forward deep neural networks (DNNs) for large vocabulary continuous speech recognition on two benchmark tasks, namely 309-hour Switchboard-I task and 1,860-hour "Switch-board+Fisher" task. We achieve almost linear speedup up to 16 GPU cards on LSTM task and 64 GPU cards on DNN task, with either no degradation or improved recognition accuracy in comparison with that of running a traditional mini-batch based stochastic gradient descent training on a single GPU.

Keywords

This publication has 17 references indexed in Scilit:

Training Deep Bidirectional LSTM Acoustic Model for LVCSR by a Context-Sensitive-Chunk BPTT Approach
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
A context-sensitive-chunk BPTT approach to training deep LSTM/BLSTM recurrent neural networks for offline handwriting recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
Framewise and CTC training of Neural Networks for handwriting recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2015
The A2iA Multi-lingual Text Recognition System at the Second Maurdor Evaluation
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2014
Asynchronous stochastic gradient descent for DNN training
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2013
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
Foundations and Trends® in Machine Learning, 2010
SWITCHBOARD: telephone speech corpus for research and development
Published by Institute of Electrical and Electronics Engineers (IEEE) ,1992
Learning representations by back-propagating errors
Nature, 1986
Some methods of speeding up the convergence of iteration methods
USSR Computational Mathematics and Mathematical Physics, 1964

Cited by 59 articles