1745df2a03efbc65.tex
1: \begin{abstract} 
2:    Federatedd learning (FL) aims to minimize the communication complexity of training a model over heterogeneous data distributed across many clients. A common approach is \localms, where clients take multiple optimization steps over local data before communicating with the server (e.g., \lsgd).  \Localms~can exploit similarity between clients' data. However, in existing analyses, this comes at the cost of slow convergence in terms of the dependence on the number of communication rounds $R$.  On the other hand, \globalms, where clients simply return a gradient vector in each round (e.g., \mbsgd), converge faster in terms of  $R$ but fail to exploit the similarity between clients even when clients are homogeneous.    
3:    We propose \fedchain, an algorithmic framework that combines the strengths of \localms~and \globalms~to achieve fast convergence in terms of $R$ while  leveraging the similarity between clients.  Using \fedchain, we instantiate algorithms that improve upon previously known rates in the general convex and PL settings, and are near-optimal (via an algorithm-independent lower bound that we show) for problems that satisfy strong convexity.  Empirical results support this theoretical gain over existing methods. 
4: \end{abstract}