abstract:add58d0fb97dc6e1.tex

1: \begin{abstract}

2: We revisit \algname{FedExProx} -- a recently proposed distributed optimization method designed to enhance convergence properties of parallel proximal algorithms via extrapolation. In the process, we uncover a surprising flaw: its known theoretical guarantees on quadratic optimization tasks are no better than those offered by the vanilla Gradient Descent (\algname{GD}) method. Motivated by this observation, we develop a novel analysis framework, establishing a tighter linear convergence rate for non-strongly convex quadratic problems. By incorporating both computation and communication costs, we demonstrate that \algname{FedExProx} can indeed provably outperform \algname{GD}, in stark contrast to the original analysis. Furthermore, we consider partial participation scenarios and analyze two adaptive extrapolation strategies -- based on gradient diversity and Polyak stepsizes --- again significantly outperforming previous results. Moving beyond quadratics, we extend the applicability of our analysis to general functions satisfying the Polyak-Łojasiewicz condition, outperforming the previous strongly convex analysis while operating under weaker assumptions. Backed by empirical results, our findings point to a new and stronger potential of \algname{FedExProx}, paving the way for further exploration of the benefits of extrapolation in federated learning.

3:

4: \end{abstract}

5: