abstract:b0882c2fd2ef6c6a.tex

1: \begin{abstract}

2: %\vspace{-1em}

3:

4: Optimizing distributed learning systems is an art

5: of balancing between computation and communication.

6: There have been two lines of research that try to

7: deal with slower networks: {\em communication

8: compression} for

9: low bandwidth networks, and {\em decentralization} for

10: high latency networks. In this paper, We explore

11: a natural question: {\em can the combination

12: of both techniques lead to

13: a system that is robust to both bandwidth

14: and latency?}

15:

16: Although the system implication of such combination

17: is trivial, the underlying theoretical principle and

18: algorithm design is challenging:  unlike centralized algorithms, simply compressing

19: {\rc exchanged information,

20: even in an unbiased stochastic way,

21: within the decentralized network would accumulate the error and fail to converge.}

22: In this paper, we develop

23: a framework of compressed, decentralized training and

24: propose two different strategies, which we call

25: {\em extrapolation compression} and {\em difference compression}.

26: We analyze both algorithms and prove

27: both converge at the rate of $O(1/\sqrt{nT})$

28: where $n$ is the number of workers and $T$ is the

29: number of iterations, matching the convergence rate for

30: full precision, centralized training. We validate

31: our algorithms and find that our proposed algorithm outperforms

32: the best of merely decentralized and merely quantized

33: algorithm significantly for networks with {\em both}

34: high latency and low bandwidth.

35: \end{abstract}

36: