abstract:4001e6ba373d951b.tex

1: \begin{abstract}

2: Federated Learning~(FL) has emerged as a de facto machine learning area and received rapid increasing research interests from the community.

3: However, catastrophic forgetting caused by data heterogeneity and partial participation poses distinctive challenges for FL, which are detrimental to the performance.

4: To tackle the problems,

5: we propose a new FL approach~(namely GradMA), which takes inspiration from continual learning to simultaneously correct the server-side and worker-side update directions as well as take full advantage of server's rich computing and memory resources.

6: Furthermore, we elaborate a memory reduction strategy to enable GradMA to accommodate FL with a large scale of workers.

7: We then analyze convergence of GradMA theoretically under the smooth non-convex setting and show that its convergence rate achieves a linear speed up w.r.t the increasing number of sampled active workers.

8: At last, our extensive experiments on various image classification tasks show that GradMA achieves significant performance gains in accuracy and communication efficiency compared to SOTA baselines.

9: We provide our code here:

10: \href{https://github.com/lkyddd/GradMA}{https://github.com/lkyddd/GradMA}.

11: % \li{Meanwhile, ablation studies demonstrate efficacy and indispensability for core modules and key parameters}.

12:

13: \end{abstract}

14: