1: \begin{abstract}
2: Distributed learning algorithms, such as the ones employed in Federated Learning (FL), require communication compression to reduce the cost of client uploads.
3: The compression methods used in practice are often biased, which require error feedback to achieve convergence when the compression is aggressive.
4: In turn, error feedback requires client-specific control variates, which directly contradicts privacy-preserving principles and requires stateful clients.
5: In this paper, we propose \textit{\algnamelong}, a novel distributed learning framework that allows highly compressible client updates by exploiting past aggregated updates, and does not require control variates.
6: We consider Distributed Gradient Descent (DGD) as a representative algorithm and provide a theoretical proof of \algname's superiority to Distributed Compressed Gradient Descent (DCGD) with biased compression in the non-smooth regime with bounded gradient dissimilarity.
7: Experimental results confirm that \algname{} consistently outperforms distributed learning with direct compression and highlight the compressibility of the client updates with \algname{}.
8: \end{abstract}
9: