1: \begin{abstract}
2: Federated Learning (FL), a distributed learning paradigm that scales on-device learning collaboratively, has emerged as a promising approach for decentralized AI applications.
3: % FL has gained increasing popularity due to its communication efficiency, massive decentralized computations, agile personalized service, and privacy preservation.
4: Local optimization methods such as Federated Averaging (\fedavg) are the most prominent methods for FL applications.
5: Despite their simplicity and popularity, the theoretical understanding of local optimization methods is far from clear.
6: This dissertation aims to advance the theoretical foundation of local methods in the following three directions.
7:
8: First, we establish sharp bounds for \fedavg, the most popular algorithm in Federated Learning.
9: We demonstrate how \fedavg may suffer from a notion we call iterate bias, and how an additional third-order smoothness assumption may mitigate this effect and lead to better convergence rates. We explain this phenomenon from a Stochastic Differential Equation (SDE) perspective.
10:
11: Second, we propose \fedacfull (\fedac), the first principled acceleration of \fedavg, which provably improves the convergence rate and communication efficiency. Our technique uses on a potential-based perturbed iterate analysis, a novel stability analysis of generalized accelerated SGD, and a strategic tradeoff between acceleration and stability.
12:
13: Third, we study the Federated Composite Optimization problem, which extends the classic smooth setting by incorporating a shared non-smooth regularizer. We show that direct extensions of \fedavg may suffer from the ``curse of primal averaging,'' resulting in slow convergence. As a solution, we propose a new primal-dual algorithm, Federated Dual Averaging, which overcomes the curse of primal averaging by employing a novel inter-client dual averaging procedure.
14: \end{abstract}
15: