abstract:a44faa7e82906fb8.tex

1: \begin{abstract}

2: We consider distributed optimization where the objective function is spread among different devices, each sending incremental model updates to a central server. To alleviate the communication bottleneck, recent work proposed various schemes to compress (e.g.\ quantize or sparsify) the gradients, thereby introducing additional variance $\omega \geq 1$ that might slow down convergence. For strongly convex functions with condition number $\kappa$ distributed among $n$ machines, we

3: (i) give a scheme that converges in $\cO((\kappa + \kappa \frac{\omega}{n} + \omega)$

4: $\log (1/\epsilon))$ steps to a neighborhood of the optimal solution.

5: % This generalizes a previous method and significantly reduces over the trivial implementation that would require $\cO(\kappa \omega)$ steps.

6: For objective functions with a finite-sum structure, each worker having less than $m$ components, we

7: (ii) present novel variance reduced schemes that converge in $\cO((\kappa + \kappa \frac{\omega}{n} + \omega + m)\log(1/\epsilon))$ steps to arbitrary accuracy $\epsilon > 0$. These are the first methods that achieve linear convergence for arbitrary quantized updates.

8: We also (iii) give analysis for the weakly convex and non-convex cases and

9: (iv) verify in experiments that our novel variance reduced schemes are more efficient than the baselines.

10: \end{abstract}

11: