abstract:75a949592f9a32fc.tex

1: \begin{abstract}

2: The convergence speed of stochastic gradient descent (SGD) can be improved by actively selecting mini-batches.

3: We explore sampling schemes where similar data points are less likely to be selected in the same mini-batch.

4: In particular, we prove that such repulsive sampling schemes lower the variance of the gradient estimator. This generalizes recent work  on using Determinantal Point Processes (DPPs) for mini-batch diversification (Zhang et al., 2017) to the broader class of repulsive point processes.

5: We first show that the phenomenon of variance reduction by diversified sampling generalizes in particular to

6: non-stationary point processes.

7: We then show that other point processes may be computationally much more efficient than DPPs. In particular, we propose and investigate Poisson Disk sampling---frequently encountered in the computer graphics community---for this task.

8: We show empirically that our approach improves over standard SGD both in terms of convergence speed as well as final model performance.

9: \end{abstract}