2f37228306955081.tex
1: \begin{abstract}
2:     Distributed learning algorithms, such as the ones employed in Federated Learning (FL), require communication compression to reduce the cost of client uploads.
3:     The compression methods used in practice are often biased, which require error feedback to achieve convergence when the compression is aggressive.
4:     In turn, error feedback requires client-specific control variates, which directly contradicts privacy-preserving principles and requires stateful clients.
5:     In this paper, we propose \textit{\algnamelong}, a novel distributed learning framework that allows highly compressible client updates by exploiting past aggregated updates, and does not require control variates.
6:     We consider Distributed Gradient Descent (DGD) as a representative algorithm and provide a theoretical proof of \algname's superiority to Distributed Compressed Gradient Descent (DCGD) with biased compression in the non-smooth regime with bounded gradient dissimilarity.
7:     Experimental results confirm that \algname{} consistently outperforms distributed learning with direct compression and highlight the compressibility of the client updates with \algname{}.
8: \end{abstract}
9: