abstract:68c7396e7df1771b.tex

1: \begin{abstract}

2: We consider decentralized stochastic optimization with the objective function (e.g. data samples for machine learning task)

3: being distributed over $n$ machines that can only communicate to their neighbors on a fixed communication graph.

4: To reduce the communication bottleneck,  the nodes compress (e.g.\ quantize or sparsify) their model updates. We cover both unbiased and biased compression operators with quality denoted by $\omega \leq 1$ ($\omega=1$ meaning no compression).

5: \\

6: We (i) propose a novel gossip-based stochastic gradient descent algorithm, \algopt,

7: %

8: that converges at rate $\cO\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex objectives, where $T$ denotes the number of iterations %

9: and $\delta$ the eigengap of the connectivity matrix.

10: Despite compression quality and network connectivity affecting the higher order terms, the first term in the rate, $\cO(1/(nT))$, is the same as for the centralized baseline with exact communication.

11: %

12: %

13: We (ii) present a novel gossip algorithm, \algcons, for the average consensus problem that converges in time $\cO(1/(\delta^2\omega) \log (1/\epsilon))$

14: for accuracy $\epsilon > 0$. This is (up to our knowledge) the first gossip algorithm that supports arbitrary compressed messages for $\omega > 0$ and still exhibits linear convergence. We (iii) show in experiments that both of our algorithms do outperform the respective state-of-the-art baselines and \algopt can reduce communication by at least two orders of magnitudes.

15: \end{abstract}

16: