1e4a4aa047a552e6.tex
1: \begin{abstract}
2: Shuffling gradient methods, which are also known as stochastic gradient
3: descent (SGD) without replacement, are widely implemented in practice,
4: particularly including three popular algorithms: Random Reshuffle
5: (RR), Shuffle Once (SO), and Incremental Gradient (IG). Compared to
6: the empirical success, the theoretical guarantee of shuffling gradient
7: methods was not well-understanding for a long time. Until recently,
8: the convergence rates had just been established for the average iterate
9: for convex functions and the last iterate for strongly convex problems
10: (using squared distance as the metric). However, when using the function
11: value gap as the convergence criterion, existing theories cannot interpret
12: the good performance of the last iterate in different settings (e.g.,
13: constrained optimization). To bridge this gap between practice and
14: theory, we prove last-iterate convergence rates for shuffling gradient
15: methods with respect to the objective value even without strong convexity.
16: Our new results either (nearly) match the existing last-iterate lower
17: bounds or are as fast as the previous best upper bounds for the average
18: iterate.
19: \end{abstract}
20: