1: \begin{abstract}
2: In this paper, we question the rationale behind propagating large numbers of parameters through a distributed system during federated learning. We start by examining the rank characteristics of the subspace spanned by gradients \shams{across epochs} (i.e., the gradient-space) in centralized model training, and observe that \shams{this} gradient-space often consists of a few leading principal components accounting for an overwhelming majority ($95-99\%$) of the explained variance. Motivated by this, we propose the "Look-back Gradient Multiplier" ({\tt LBGM}) algorithm, which \shams{exploits} this low-rank property \shams{to enable gradient recycling} between model update rounds \shams{of federated learning, reducing transmissions of large parameters to single scalars for aggregation}. We analytically characterize the convergence behavior of {\tt LBGM}, revealing the nature of the trade-off between communication savings and model performance. Our subsequent experimental results demonstrate the improvement {\tt LBGM} obtains in communication overhead compared to \shams{conventional }federated learning \shams{on several datasets and deep learning models}. Additionally, we show that {\tt LBGM} is a general plug-and-play algorithm that can be used standalone or stacked on top of existing sparsification techniques for distributed model training.
3: \end{abstract}
4: