abstract:d359f302478fa009.tex

1: \begin{abstract}

2:     We develop a new approach to tackle communication constraints in a distributed learning problem with a central server.

3:     We propose and analyze a new algorithm that performs bidirectional compression and achieves the same convergence rate as algorithms using only uplink (from the local workers to the central server) compression.

4:     To obtain this improvement, we design \MCM, an algorithm such that the downlink compression \emph{only impacts local models}, while the global model is preserved.

5:     As a result, and contrary to previous works, the gradients on local servers are computed on \emph{perturbed models}. Consequently, convergence proofs are more challenging and require a precise control of this perturbation. To ensure it, \MCM~additionally combines model compression with a memory mechanism.

6:     This analysis opens new doors, e.g. incorporating  worker dependent randomized-models and partial participation.

7: \end{abstract}

8: