abstract:67caaf797ee5d823.tex

1: \begin{abstract}

2: We propose a novel algorithm for distributed stochastic gradient descent (SGD) with compressed gradient communication in the parameter-server framework. Our gradient compression technique, named flattened one-bit stochastic gradient descent (\ouralgo), relies on two simple algorithmic ideas: \emph{(i)} a one-bit quantization procedure leveraging the technique of dithering, and \emph{(ii)} a randomized fast Walsh-Hadamard transform to flatten the stochastic gradient before

3: quantization. As a result, the approximation of the true gradient in this scheme is biased,

4: but it prevents commonly encountered algorithmic problems, such as exploding variance in the one-bit compression regime, deterioration of performance in the case of sparse gradients, and restrictive assumptions on the distribution of the stochastic gradients.

5: In fact, we show SGD-like convergence guarantees under mild conditions.

6: The compression technique can be used in both directions of worker-server communication, therefore admitting distributed optimization with full communication compression.

7: \end{abstract}

8: