1: \begin{abstract}
2: We develop a new approach to tackle communication constraints in a distributed learning problem with a central server.
3: We propose and analyze a new algorithm that performs bidirectional compression and achieves the same convergence rate as algorithms using only uplink (from the local workers to the central server) compression.
4: To obtain this improvement, we design \MCM, an algorithm such that the downlink compression \emph{only impacts local models}, while the global model is preserved.
5: As a result, and contrary to previous works, the gradients on local servers are computed on \emph{perturbed models}. Consequently, convergence proofs are more challenging and require a precise control of this perturbation. To ensure it, \MCM~additionally combines model compression with a memory mechanism.
6: This analysis opens new doors, e.g. incorporating worker dependent randomized-models and partial participation.
7: \end{abstract}
8: