abstract:1e4a4aa047a552e6.tex

1: \begin{abstract}

2: Shuffling gradient methods, which are also known as stochastic gradient

3: descent (SGD) without replacement, are widely implemented in practice,

4: particularly including three popular algorithms: Random Reshuffle

5: (RR), Shuffle Once (SO), and Incremental Gradient (IG). Compared to

6: the empirical success, the theoretical guarantee of shuffling gradient

7: methods was not well-understanding for a long time. Until recently,

8: the convergence rates had just been established for the average iterate

9: for convex functions and the last iterate for strongly convex problems

10: (using squared distance as the metric). However, when using the function

11: value gap as the convergence criterion, existing theories cannot interpret

12: the good performance of the last iterate in different settings (e.g.,

13: constrained optimization). To bridge this gap between practice and

14: theory, we prove last-iterate convergence rates for shuffling gradient

15: methods with respect to the objective value even without strong convexity.

16: Our new results either (nearly) match the existing last-iterate lower

17: bounds or are as fast as the previous best upper bounds for the average

18: iterate.

19: \end{abstract}

20: