1: \begin{abstract}
2: Federatedd learning (FL) aims to minimize the communication complexity of training a model over heterogeneous data distributed across many clients. A common approach is \localms, where clients take multiple optimization steps over local data before communicating with the server (e.g., \lsgd). \Localms~can exploit similarity between clients' data. However, in existing analyses, this comes at the cost of slow convergence in terms of the dependence on the number of communication rounds $R$. On the other hand, \globalms, where clients simply return a gradient vector in each round (e.g., \mbsgd), converge faster in terms of $R$ but fail to exploit the similarity between clients even when clients are homogeneous.
3: We propose \fedchain, an algorithmic framework that combines the strengths of \localms~and \globalms~to achieve fast convergence in terms of $R$ while leveraging the similarity between clients. Using \fedchain, we instantiate algorithms that improve upon previously known rates in the general convex and PL settings, and are near-optimal (via an algorithm-independent lower bound that we show) for problems that satisfy strong convexity. Empirical results support this theoretical gain over existing methods.
4: \end{abstract}