90fc8fb05fe190f9.tex
1: \begin{abstract}
2: Many machine learning problems involve Monte Carlo gradient estimators. 
3: As a prominent example, we focus on Monte Carlo variational inference (\textsc{mcvi}) in this paper. %where one optimizes a lower bound of a Bayesian model's marginal likelihood. 
4: The performance of \textsc{mcvi} crucially depends on the variance of its stochastic gradients. We propose variance reduction by means of Quasi-Monte Carlo (\textsc{qmc}) sampling. \textsc{qmc} replaces $N$ i.i.d. samples from a uniform probability distribution by a deterministic sequence of samples of length $N$. This sequence
5: covers the underlying random variable space
6: more evenly than i.i.d. draws, reducing the variance of the gradient estimator. 
7: With our novel approach, both the score function and the reparameterization gradient estimators lead to much faster convergence.  
8: %We show that for the score function as well as the reparameterization gradient estimators of \textsc{mcvi}, we can achieve significantly faster convergence, when using \textsc{qmc} for estimating the gradients. 
9: We also propose a new algorithm for Monte Carlo objectives, where
10: we operate with a constant learning rate and increase the number of \textsc{qmc} samples per iteration. We prove that this way, our algorithm can converge asymptotically at a faster rate than \textsc{sgd}.
11: %When using a constant learning rate and an increasing number of \textsc{qmc} samples per stochastic gradient step, we prove that we can converge asymptotically at a faster rate than \textsc{sgd}. 
12: We furthermore provide theoretical guarantees on \textsc{qmc} for Monte Carlo objectives that go beyond \textsc{mcvi}, and support our findings by several experiments on large-scale data sets from various domains.
13: \end{abstract}
14: