abstract:90251b761ac043d8.tex

1: \begin{abstract}

2: Sign-based algorithms (e.g. {\signsgd}) have been proposed as a biased gradient compression technique to alleviate the communication bottleneck in training large neural networks across multiple workers. We show simple convex counter-examples where signSGD does not converge to the optimum.

3: Further, even when it does converge, signSGD may generalize poorly when compared with SGD.  These issues arise because of the biased nature of the sign compression operator.

4:

5:

6: We then show that using error-feedback, i.e. incorporating the error made by the compression operator into the next step, overcomes these issues. We prove that our algorithm (\ecsgd) with arbitrary compression operator achieves the \emph{same rate of convergence} as SGD without any additional assumptions. Thus \ecsgd\ achieves gradient compression \emph{for free}. Our experiments thoroughly substantiate the theory and show that error-feedback improves both convergence and generalization. Code can be found at \url{https://github.com/epfml/error-feedback-SGD}.

7: %

8: \end{abstract}

9: