83afbcd11646e0a8.tex
1: \begin{abstract}
2:     We study the classical optimization problem $\min_{x \in \R^d} f(x)$ and analyze the gradient descent (\algname{GD}) method in both nonconvex and convex settings. It is well-known that, under the $L$–smoothness assumption ($\|\nabla^2 f(x)\| \leq L$), the optimal point minimizing the quadratic upper bound $f(x_k) + \inp{\nabla f(x_k)}{x_{k+1} - x_k} + \nicefrac{L}{2} \norm{x_{k+1} - x_k}^2$ is $x_{k+1} = x_k - \gamma_k \nabla f(x_k)$ with step size  $\gamma_k = \nicefrac{1}{L}$. Surprisingly, a similar result can be derived under the $\ell$-generalized smoothness assumption ($\|\nabla^2 f(x)\| \leq \ell(\norm{\nabla f(x)})$). In this case, we derive the step size $$\gamma_k = \int_{0}^{1} \frac{d v}{\ell(\norm{\nabla f(x_k)} + \norm{\nabla f(x_k)} v)}.$$ Using this step size rule, we improve upon existing theoretical convergence rates and obtain new results in several previously unexplored setups.
3: \end{abstract}
4: