08e6034f4d15510f.tex
1: \begin{abstract}
2: Recent studies have shown that many nonconvex machine learning problems meet a so-called generalized-smooth condition that extends beyond traditional smooth nonconvex optimization. However, the existing algorithms designed for generalized-smooth nonconvex optimization encounter significant limitations in both their design and convergence analysis.
3: % Large language models and complex training paradigms challenge the assumption of L-smoothness in optimization, revealing  the need for generalized smoothness conditions. 
4: In this work, we first study deterministic generalized-smooth nonconvex optimization and analyze the convergence of normalized gradient descent under the generalized Polyak-{\L}ojasiewicz condition. Our results 
5: provide a comprehensive understanding of the interplay between gradient normalization and function geometry. 
6: %introduce  and analyze how $\beta$-normalized gradient descent hyper-parameter choices and learning objective geometry characterized by generalized smoothness and P{\L} condition impact convergence. 
7: Then, for stochastic generalized-smooth nonconvex optimization, we propose an independently-normalized stochastic gradient descent algorithm, which leverages independent sampling, gradient normalization and clipping to achieve an $\mathcal{O}(\epsilon^{-4})$ sample complexity under relaxed assumptions. Experiments demonstrate the fast convergence of our algorithm.
8: \end{abstract}
9: