63736ed4b7156dd5.tex
1: \begin{abstract}
2:  Bilevel optimization, the problem of minimizing a \emph{value function} which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale \new{empirical risk minimization} setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates.
3: To overcome this problem we introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time. These directions are written as a sum, making it straightforward to derive unbiased estimates.
4: The simplicity of our approach allows us to develop global variance reduction algorithms, where the dynamics of all variables is subject to variance reduction.
5: We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(\frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-\L ojasciewicz assumption.
6: This is the first stochastic algorithm for bilevel optimization that verifies either of these properties.
7: Numerical experiments validate the usefulness of our method.
8: \end{abstract}
9: