abstract:2f37228306955081.tex

1: \begin{abstract}

2:     Distributed learning algorithms, such as the ones employed in Federated Learning (FL), require communication compression to reduce the cost of client uploads.

3:     The compression methods used in practice are often biased, which require error feedback to achieve convergence when the compression is aggressive.

4:     In turn, error feedback requires client-specific control variates, which directly contradicts privacy-preserving principles and requires stateful clients.

5:     In this paper, we propose \textit{\algnamelong}, a novel distributed learning framework that allows highly compressible client updates by exploiting past aggregated updates, and does not require control variates.

6:     We consider Distributed Gradient Descent (DGD) as a representative algorithm and provide a theoretical proof of \algname's superiority to Distributed Compressed Gradient Descent (DCGD) with biased compression in the non-smooth regime with bounded gradient dissimilarity.

7:     Experimental results confirm that \algname{} consistently outperforms distributed learning with direct compression and highlight the compressibility of the client updates with \algname{}.

8: \end{abstract}

9: