1: \begin{abstract}
2: We develop new sub-optimality bounds for gradient descent (GD) that depend on
3: the conditioning of the objective along the path of optimization, rather than
4: on global, worst-case constants.
5: Key to our proofs is directional smoothness, a measure of gradient variation
6: that we use to develop upper-bounds on the objective.
7: Minimizing these upper-bounds requires solving implicit equations to obtain a
8: sequence of strongly adapted step-sizes; we show that these equations are
9: straightforward to solve for convex quadratics and lead to new guarantees for
10: two classical step-sizes.
11: For general functions, we prove that the Polyak step-size and normalized GD
12: obtain fast, path-dependent rates despite using no knowledge of the
13: directional smoothness.
14: Experiments on logistic regression show our convergence guarantees are tighter
15: than the classical theory based on \( L \)-smoothness.
16: \end{abstract}
17: