1: \begin{abstract}
2: Federated Learning~(FL) has emerged as a de facto machine learning area and received rapid increasing research interests from the community.
3: However, catastrophic forgetting caused by data heterogeneity and partial participation poses distinctive challenges for FL, which are detrimental to the performance.
4: To tackle the problems,
5: we propose a new FL approach~(namely GradMA), which takes inspiration from continual learning to simultaneously correct the server-side and worker-side update directions as well as take full advantage of server's rich computing and memory resources.
6: Furthermore, we elaborate a memory reduction strategy to enable GradMA to accommodate FL with a large scale of workers.
7: We then analyze convergence of GradMA theoretically under the smooth non-convex setting and show that its convergence rate achieves a linear speed up w.r.t the increasing number of sampled active workers.
8: At last, our extensive experiments on various image classification tasks show that GradMA achieves significant performance gains in accuracy and communication efficiency compared to SOTA baselines.
9: We provide our code here:
10: \href{https://github.com/lkyddd/GradMA}{https://github.com/lkyddd/GradMA}.
11: % \li{Meanwhile, ablation studies demonstrate efficacy and indispensability for core modules and key parameters}.
12:
13: \end{abstract}
14: