1: \begin{abstract}
2: Bilevel optimization is a popular two-level hierarchical optimization, which has been widely applied to many machine learning tasks such as hyperparameter learning, meta learning and continual learning.
3: Although many bilevel optimization methods
4: recently have been developed, the bilevel methods are not well studied when the lower-level problem is nonconvex. To fill this gap, in the paper, we study a class of nonconvex
5: bilevel optimization problems, where both upper-level and lower-level problems are nonconvex, and the lower-level problem satisfies Polyak-{\L}ojasiewicz (PL) condition.
6: We propose an efficient momentum-based gradient bilevel method (MGBiO) to solve these deterministic problems.
7: Meanwhile, we propose a class of efficient momentum-based stochastic gradient bilevel methods (MSGBiO and VR-MSGBiO) to solve these stochastic problems.
8: Moreover, we provide a useful convergence analysis framework for our methods. Specifically, under some mild conditions, we prove that our MGBiO method has a sample (or gradient) complexity of $O(\epsilon^{-2})$ for finding an $\epsilon$-stationary solution of the deterministic bilevel problems (i.e., $\|\nabla F(x)\|\leq \epsilon$),
9: which improves the existing best results by a factor of $O(\epsilon^{-1})$.
10: Meanwhile, we prove that our MSGBiO and VR-MSGBiO methods have sample complexities of $\tilde{O}(\epsilon^{-4})$ and $\tilde{O}(\epsilon^{-3})$, respectively, in finding an
11: $\epsilon$-stationary solution of the stochastic bilevel problems
12: (i.e., $\mathbb{E}\|\nabla F(x)\|\leq \epsilon$),
13: which improves the existing best results by a factor of $\tilde{O}(\epsilon^{-3})$.
14: Extensive experimental results on bilevel PL game and hyper-representation learning
15: demonstrate the efficiency of our algorithms. This paper commemorates the mathematician Boris Polyak (1935 -2023).
16: \end{abstract}
17: