1: \begin{abstract}
2: Despite the strong theoretical guarantees that variance-reduced finite-sum
3: optimization algorithms enjoy, their applicability remains limited to cases
4: where the memory overhead they introduce (SAG/SAGA), or the periodic full
5: gradient computation they require (SVRG/SARAH) are manageable.
6: A promising approach to achieving variance reduction while avoiding these
7: drawbacks is the use of importance sampling instead of control variates.
8: While many such methods have been proposed in the literature,
9: directly proving that they improve the convergence of the resulting optimization
10: algorithm has remained elusive.
11: In this work, we propose an importance-sampling-based algorithm we call SRG
12: (stochastic reweighted gradient).
13: We analyze the convergence of SRG in the strongly-convex case and show that, while
14: it does not recover the linear rate of control variates methods, it provably
15: outperforms SGD.
16: We pay particular attention to the time and memory overhead of our proposed method,
17: and design a specialized red-black tree allowing its efficient
18: implementation. Finally, we present empirical results to support our findings.
19: \end{abstract}
20: