68c7396e7df1771b.tex
1: \begin{abstract}
2: We consider decentralized stochastic optimization with the objective function (e.g. data samples for machine learning task) 
3: being distributed over $n$ machines that can only communicate to their neighbors on a fixed communication graph. 
4: To reduce the communication bottleneck,  the nodes compress (e.g.\ quantize or sparsify) their model updates. We cover both unbiased and biased compression operators with quality denoted by $\omega \leq 1$ ($\omega=1$ meaning no compression).
5: \\
6: We (i) propose a novel gossip-based stochastic gradient descent algorithm, \algopt, 
7: %
8: that converges at rate $\cO\left(1/(nT) + 1/(T \delta^2 \omega)^2\right)$ for strongly convex objectives, where $T$ denotes the number of iterations %
9: and $\delta$ the eigengap of the connectivity matrix. 
10: Despite compression quality and network connectivity affecting the higher order terms, the first term in the rate, $\cO(1/(nT))$, is the same as for the centralized baseline with exact communication.
11: %
12: %
13: We (ii) present a novel gossip algorithm, \algcons, for the average consensus problem that converges in time $\cO(1/(\delta^2\omega) \log (1/\epsilon))$ 
14: for accuracy $\epsilon > 0$. This is (up to our knowledge) the first gossip algorithm that supports arbitrary compressed messages for $\omega > 0$ and still exhibits linear convergence. We (iii) show in experiments that both of our algorithms do outperform the respective state-of-the-art baselines and \algopt can reduce communication by at least two orders of magnitudes.
15: \end{abstract}
16: