abstract:90fc8fb05fe190f9.tex

1: \begin{abstract}

2: Many machine learning problems involve Monte Carlo gradient estimators.

3: As a prominent example, we focus on Monte Carlo variational inference (\textsc{mcvi}) in this paper. %where one optimizes a lower bound of a Bayesian model's marginal likelihood.

4: The performance of \textsc{mcvi} crucially depends on the variance of its stochastic gradients. We propose variance reduction by means of Quasi-Monte Carlo (\textsc{qmc}) sampling. \textsc{qmc} replaces $N$ i.i.d. samples from a uniform probability distribution by a deterministic sequence of samples of length $N$. This sequence

5: covers the underlying random variable space

6: more evenly than i.i.d. draws, reducing the variance of the gradient estimator.

7: With our novel approach, both the score function and the reparameterization gradient estimators lead to much faster convergence.

8: %We show that for the score function as well as the reparameterization gradient estimators of \textsc{mcvi}, we can achieve significantly faster convergence, when using \textsc{qmc} for estimating the gradients.

9: We also propose a new algorithm for Monte Carlo objectives, where

10: we operate with a constant learning rate and increase the number of \textsc{qmc} samples per iteration. We prove that this way, our algorithm can converge asymptotically at a faster rate than \textsc{sgd}.

11: %When using a constant learning rate and an increasing number of \textsc{qmc} samples per stochastic gradient step, we prove that we can converge asymptotically at a faster rate than \textsc{sgd}.

12: We furthermore provide theoretical guarantees on \textsc{qmc} for Monte Carlo objectives that go beyond \textsc{mcvi}, and support our findings by several experiments on large-scale data sets from various domains.

13: \end{abstract}

14: