Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering
- 1 March 2016
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 5880-5884
- https://doi.org/10.1109/icassp.2016.7472805
Abstract
We present a new approach to scalable training of deep learning machines by incremental block training with intra-block parallel optimization to leverage data parallelism and blockwise model-update filtering to stabilize learning process. By using an implementation on a distributed GPU cluster with an MPI-based HPC machine learning framework to coordinate parallel job scheduling and collective communication, we have trained successfully deep bidirectional long short-term memory (LSTM) recurrent neural networks (RNNs) and fully-connected feed-forward deep neural networks (DNNs) for large vocabulary continuous speech recognition on two benchmark tasks, namely 309-hour Switchboard-I task and 1,860-hour "Switch-board+Fisher" task. We achieve almost linear speedup up to 16 GPU cards on LSTM task and 64 GPU cards on DNN task, with either no degradation or improved recognition accuracy in comparison with that of running a traditional mini-batch based stochastic gradient descent training on a single GPU.Keywords
This publication has 17 references indexed in Scilit:
- Training Deep Bidirectional LSTM Acoustic Model for LVCSR by a Context-Sensitive-Chunk BPTT ApproachIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet ClassificationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- A context-sensitive-chunk BPTT approach to training deep LSTM/BLSTM recurrent neural networks for offline handwriting recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Framewise and CTC training of Neural Networks for handwriting recognitionPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- The A2iA Multi-lingual Text Recognition System at the Second Maurdor EvaluationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Asynchronous stochastic gradient descent for DNN trainingPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2013
- Distributed Optimization and Statistical Learning via the Alternating Direction Method of MultipliersFoundations and Trends® in Machine Learning, 2010
- SWITCHBOARD: telephone speech corpus for research and developmentPublished by Institute of Electrical and Electronics Engineers (IEEE) ,1992
- Learning representations by back-propagating errorsNature, 1986
- Some methods of speeding up the convergence of iteration methodsUSSR Computational Mathematics and Mathematical Physics, 1964