abstract:28e041633ae5c9b8.tex

1: \begin{abstract}

2:     Recent works by \citet{altschuler2023accelerationPartII} and \citet{grimmer2024accelerated} have shown that it is possible to accelerate the convergence of gradient descent on smooth convex functions, even without momentum, just by picking special stepsizes.

3:     In this paper, we provide a general theory for composing stepsize schedules capturing all recent advances in this area and more. We propose three notions of ``composable'' stepsize schedules with elementary associated composition operations for combining them. From these operations, in addition to recovering recent works, we construct three highly optimized sequences of stepsize schedules. We first construct optimized stepsize schedules of every length generalizing the exponentially spaced silver stepsizes of~\cite{altschuler2023accelerationPartII}. We then construct highly optimized stepsizes schedules for minimizing final objective gap or gradient norm, improving on prior rates by constants and, more importantly, matching or beating the numerically computed minimax optimal schedules of~\cite{gupta2023branch}. We conjecture these schedules are in fact minimax (information theoretic) optimal.

4:     Several novel tertiary results follow from our theory including recovery of the recent dynamic gradient norm minimizing short stepsizes of~\cite{rotaru2024exact} and extending them to objective gap minimization.

5:     % , and produce alternative, stronger dynamic short stepsize schedules.

6: \end{abstract}

7: