abstract:253c4fd3298c7971.tex

1: \begin{abstract}

2:     We develop new sub-optimality bounds for gradient descent (GD) that depend on

3:     the conditioning of the objective along the path of optimization, rather than

4:     on global, worst-case constants.

5:     Key to our proofs is directional smoothness, a measure of gradient variation

6:     that we use to develop upper-bounds on the objective.

7:     Minimizing these upper-bounds requires solving implicit equations to obtain a

8:     sequence of strongly adapted step-sizes; we show that these equations are

9:     straightforward to solve for convex quadratics and lead to new guarantees for

10:     two classical step-sizes.

11:     For general functions, we prove that the Polyak step-size and normalized GD

12:     obtain fast, path-dependent rates despite using no knowledge of the

13:     directional smoothness.

14:     Experiments on logistic regression show our convergence guarantees are tighter

15:     than the classical theory based on \( L \)-smoothness.

16: \end{abstract}

17: