abstract:b056222c2abca373.tex

1: \begin{abstract}

2:

3: \algname{LocalSGD} and \algname{SCAFFOLD} are widely used methods in distributed stochastic optimization, with numerous applications in machine learning, large-scale data processing, and federated learning. However, rigorously establishing their theoretical advantages over simpler methods, such as minibatch SGD (\algname{MbSGD}), has proven challenging, as existing analyses often rely on strong assumptions, unrealistic premises, or overly restrictive scenarios.

4:

5: In this work, we revisit the convergence properties of \algname{LocalSGD} and \algname{SCAFFOLD} under a variety of existing or weaker conditions, including gradient similarity, Hessian similarity, weak convexity, and Lipschitz continuity of the Hessian. Our analysis shows that (i) \algname{LocalSGD} achieves faster convergence compared to \algname{MbSGD} for weakly convex functions without requiring stronger gradient similarity assumptions; (ii) \algname{LocalSGD} benefits significantly from higher-order similarity and smoothness; and (iii) \algname{SCAFFOLD} demonstrates faster convergence than \algname{MbSGD} for a broader class of non-quadratic functions. These theoretical insights provide a clearer understanding of the conditions under which \algname{LocalSGD} and \algname{SCAFFOLD} outperform \algname{MbSGD}.

6: \end{abstract}

7: