abstract:08e6034f4d15510f.tex

1: \begin{abstract}

2: Recent studies have shown that many nonconvex machine learning problems meet a so-called generalized-smooth condition that extends beyond traditional smooth nonconvex optimization. However, the existing algorithms designed for generalized-smooth nonconvex optimization encounter significant limitations in both their design and convergence analysis.

3: % Large language models and complex training paradigms challenge the assumption of L-smoothness in optimization, revealing  the need for generalized smoothness conditions.

4: In this work, we first study deterministic generalized-smooth nonconvex optimization and analyze the convergence of normalized gradient descent under the generalized Polyak-{\L}ojasiewicz condition. Our results

5: provide a comprehensive understanding of the interplay between gradient normalization and function geometry.

6: %introduce  and analyze how $\beta$-normalized gradient descent hyper-parameter choices and learning objective geometry characterized by generalized smoothness and P{\L} condition impact convergence.

7: Then, for stochastic generalized-smooth nonconvex optimization, we propose an independently-normalized stochastic gradient descent algorithm, which leverages independent sampling, gradient normalization and clipping to achieve an $\mathcal{O}(\epsilon^{-4})$ sample complexity under relaxed assumptions. Experiments demonstrate the fast convergence of our algorithm.

8: \end{abstract}

9: