abstract:3530e15fba1c2604.tex

1: \begin{abstract}

2: Bilevel optimization is a popular two-level hierarchical optimization, which has been widely applied to many machine learning tasks such as hyperparameter learning, meta learning and continual learning.

3: Although many bilevel optimization methods

4:  recently have been developed, the bilevel methods are not well studied when the lower-level problem is nonconvex. To fill this gap, in the paper, we study a class of nonconvex

5: bilevel optimization problems, where both upper-level and lower-level problems are nonconvex, and the lower-level problem satisfies Polyak-{\L}ojasiewicz (PL) condition.

6: We propose an efficient momentum-based gradient bilevel method (MGBiO) to solve these deterministic problems.

7: Meanwhile, we propose a class of efficient momentum-based stochastic gradient bilevel methods (MSGBiO and  VR-MSGBiO) to solve these stochastic problems.

8: Moreover, we provide a useful convergence analysis framework for our methods. Specifically, under some mild conditions, we prove that our MGBiO method has a sample (or gradient) complexity of $O(\epsilon^{-2})$ for finding an $\epsilon$-stationary solution of the deterministic bilevel problems (i.e., $\|\nabla F(x)\|\leq \epsilon$),

9: which improves the existing best results by a factor of $O(\epsilon^{-1})$.

10: Meanwhile, we prove that our MSGBiO and VR-MSGBiO methods have sample complexities of $\tilde{O}(\epsilon^{-4})$ and $\tilde{O}(\epsilon^{-3})$, respectively, in finding an

11: $\epsilon$-stationary solution of the stochastic bilevel problems

12: (i.e., $\mathbb{E}\|\nabla F(x)\|\leq \epsilon$),

13: which improves the existing best results by a factor of $\tilde{O}(\epsilon^{-3})$.

14: Extensive experimental results on bilevel PL game and hyper-representation learning

15: demonstrate the efficiency of our algorithms. This paper commemorates the mathematician Boris Polyak (1935 -2023).

16: \end{abstract}

17: