1: \begin{abstract}
2: Sign-based algorithms (e.g. {\signsgd}) have been proposed as a biased gradient compression technique to alleviate the communication bottleneck in training large neural networks across multiple workers. We show simple convex counter-examples where signSGD does not converge to the optimum.
3: Further, even when it does converge, signSGD may generalize poorly when compared with SGD. These issues arise because of the biased nature of the sign compression operator.
4:
5:
6: We then show that using error-feedback, i.e. incorporating the error made by the compression operator into the next step, overcomes these issues. We prove that our algorithm (\ecsgd) with arbitrary compression operator achieves the \emph{same rate of convergence} as SGD without any additional assumptions. Thus \ecsgd\ achieves gradient compression \emph{for free}. Our experiments thoroughly substantiate the theory and show that error-feedback improves both convergence and generalization. Code can be found at \url{https://github.com/epfml/error-feedback-SGD}.
7: %
8: \end{abstract}
9: