abstract:ad1f2b4c2d2075e3.tex

1: \begin{abstract}

2:     Federated Learning (FL), a distributed learning paradigm that scales on-device learning collaboratively, has emerged as a promising approach for decentralized AI applications.

3: % FL has gained increasing popularity due to its communication efficiency, massive decentralized computations, agile personalized service, and privacy preservation.

4: Local optimization methods such as Federated Averaging (\fedavg) are the most prominent methods for FL applications.

5: Despite their simplicity and popularity, the theoretical understanding of local optimization methods is far from clear.

6: This dissertation aims to advance the theoretical foundation of local  methods in the following three directions.

7:

8: First, we establish sharp bounds for \fedavg, the most popular algorithm in Federated Learning.

9: We demonstrate how \fedavg may suffer from a notion we call iterate bias, and how an additional third-order smoothness assumption may mitigate this effect and lead to better convergence rates. We explain this phenomenon from a Stochastic Differential Equation (SDE) perspective.

10:

11: Second, we propose \fedacfull (\fedac), the first principled acceleration of \fedavg, which provably improves the convergence rate and communication efficiency.  Our technique uses on a potential-based perturbed iterate analysis, a novel stability analysis of generalized accelerated SGD, and a strategic tradeoff between acceleration and stability.

12:

13: Third, we study the Federated Composite Optimization problem, which extends the classic smooth setting by incorporating a shared non-smooth regularizer. We show that direct extensions of  \fedavg may suffer from the ``curse of primal averaging,'' resulting in slow convergence. As a solution, we propose a new primal-dual algorithm, Federated Dual Averaging, which overcomes the curse of primal averaging by employing a novel inter-client dual averaging procedure.

14: \end{abstract}

15: