1: \begin{abstract}
2: For finite-sum optimization, variance-reduced gradient methods (VR)
3: compute at each iteration the gradient of a single function (or of a
4: mini-batch), and yet achieve faster convergence than SGD thanks to a
5: carefully crafted lower-variance stochastic gradient estimator that
6: reuses past gradients. Another important line of research of the
7: past decade in continuous optimization is the adaptive algorithms
8: such as AdaGrad, that dynamically adjust the (possibly
9: coordinate-wise) learning rate to past gradients and thereby adapt to
10: the geometry of the objective function. Variants such as RMSprop and
11: Adam demonstrate outstanding practical performance that have
12: contributed to the success of deep learning. In this work, we
13: present AdaLVR, which combines the AdaGrad algorithm with
14: \emph{loopless} variance-reduced gradient estimators such as SAGA or L-SVRG
15: that benefits from a straightforward construction and a streamlined analysis. We
16: assess that AdaLVR inherits both good convergence properties from VR
17: methods and the adaptive nature of AdaGrad: in the case of
18: $L$-smooth convex functions we establish a gradient complexity of
19: $O(n+(L+\sqrt{nL})/\varepsilon)$ without prior knowledge of $L$. Numerical
20: experiments demonstrate the superiority of AdaLVR over
21: state-of-the-art methods. Moreover, we empirically show that the
22: RMSprop and Adam algorithm combined with variance-reduced gradients
23: estimators achieve even faster convergence.
24: \end{abstract}
25: