d359f302478fa009.tex
1: \begin{abstract}
2:     We develop a new approach to tackle communication constraints in a distributed learning problem with a central server. 
3:     We propose and analyze a new algorithm that performs bidirectional compression and achieves the same convergence rate as algorithms using only uplink (from the local workers to the central server) compression. 
4:     To obtain this improvement, we design \MCM, an algorithm such that the downlink compression \emph{only impacts local models}, while the global model is preserved. 
5:     As a result, and contrary to previous works, the gradients on local servers are computed on \emph{perturbed models}. Consequently, convergence proofs are more challenging and require a precise control of this perturbation. To ensure it, \MCM~additionally combines model compression with a memory mechanism.
6:     This analysis opens new doors, e.g. incorporating  worker dependent randomized-models and partial participation.
7: \end{abstract}
8: