Distributed SGD With Flexible Gradient Compression

Open Access

31 March 2020

journal article
research article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Access

Vol. 8, 64707-64717
https://doi.org/10.1109/access.2020.2984633

Abstract

We design and evaluate a new algorithm called FLEXCOMPRESSSGD for training deep neural networks over distributed datasets via multiple workers and a central server. In FLEXCOMPRESSSGD, all gradients transmitted between workers and the server are compressed, and the workers are allowed to flexibly choose a compressing method different from that of the server. This flexibility significantly helps reduce the communication cost from each worker to the server. We mathematically prove that FLEXCOMPRESSSGD converges with convergence rate 1/√MT where M is the number of distributed workers and T is the number of training iterations. We experimentally demonstrate that FLEXCOMPRESSSGD obtains competitive top-1 testing accuracy on the ImageNet dataset while being able to reduce more than 70% communication cost from each worker to the server when compared with the state-of-the-art.

Keywords

This publication has 4 references indexed in Scilit:

Deep Residual Learning for Image Recognition
Published by Institute of Electrical and Electronics Engineers (IEEE) ,2016
ImageNet Large Scale Visual Recognition Challenge
International Journal of Computer Vision, 2015
The tail at scale
Communications of the ACM, 2013
Introductory Lectures on Convex Optimization
Published by Springer Science and Business Media LLC ,2004

Cited by 13 articles