c1b837961b9007aa.tex
1: \begin{abstract}
2: \vspace{-.1in}
3:     Many stochastic optimization algorithms work by estimating the gradient of the cost function on the fly by sampling datapoints uniformly at random from a training set.
4: 	%
5: However, the estimator might have a large variance, which inadvertantly slows down the convergence rate of the algorithms. 
6: %
7: One way to reduce this variance is to  sample the datapoints from a carefully selected non-uniform distribution. %, which then need to be determined, and is a challenging task.
8: %
9: % Previous work minimizes an upper bound of the variance, but the gap between this upper bound and the optimal variance may remain large.
10: In this work, we propose a novel  non-uniform sampling approach that uses the multi-armed bandit framework. 
11: %
12: Theoretically, we show that our algorithm asymptotically approximates the optimal variance within a factor of 3.
13: %
14: Empirically, we show that using this datapoint-selection technique results in a significant reduction of the convergence time and variance of several stochastic optimization algorithms such as SGD and SAGA.
15: %
16: This approach for sampling datapoints is general, and can be used in conjunction with \emph{any} algorithm that uses an unbiased gradient estimation -- we expect it to have broad applicability beyond the specific examples explored in this work. 
17: \end{abstract}
18: