7e0114b5b8a019fa.tex
1: \begin{abstract}
2: In distributed learning, \textbf{local SGD} (also known as federated averaging) and its simple baseline \textbf{minibatch SGD} are widely studied optimization methods. Most existing analyses of these methods assume independent and unbiased gradient estimates obtained via \emph{with-replacement} sampling. In contrast, we study \emph{shuffling-based} variants: \textbf{minibatch} and \textbf{local Random Reshuffling}, which draw stochastic gradients without replacement and are thus closer to practice. For smooth functions satisfying the Polyak-{\L}ojasiewicz condition, we obtain convergence bounds (in the large epoch regime) which show that these shuffling-based variants converge \emph{faster} than their with-replacement counterparts. Moreover, we prove matching lower bounds showing that our convergence analysis is \emph{tight}. Finally, we propose an algorithmic modification called \textbf{synchronized shuffling} that leads to convergence rates \emph{faster} than our lower bounds in near-homogeneous settings.
3: \end{abstract}
4: