abstract:9e20c0975a7c0513.tex

1: \begin{abstract}

2:   Modern \gls{VI} uses stochastic gradients to avoid intractable

3:   expectations, enabling large-scale probabilistic inference in

4:   complex models.  \gls{VI} posits a family of approximating

5:   distributions $q$ and then finds the member of that family that is

6:   closest to the exact posterior $p$. Traditionally, \gls{VI}

7:   algorithms minimize the ``exclusive \gls{KL}'' $\KL{q}{p}$, often

8:   for computational convenience. Recent research, however, has also

9:   focused on the ``inclusive \gls{KL}'' $\KL{p}{q}$, which has good

10:   statistical properties that makes it more appropriate for certain

11:   inference problems.  This paper develops a simple algorithm for

12:   reliably minimizing the inclusive \gls{KL} using stochastic gradients with vanishing bias. % Consider a valid \gls{MCMC} method, a Markov chain whose stationary distribution is $p$. The algorithm we develop iteratively samples the chain $\latent[k]$, and then uses those samples to follow the score function of the variational approximation, $\nabla \log q(\latent[k])$ with a  Robbins-Monro step-size schedule.

13: This method, which we call \gls{MSC}, converges to a local optimum of the inclusive \gls{KL}. It does not suffer from the systematic errors inherent in existing methods, such as Reweighted Wake-Sleep and Neural Adaptive Sequential Monte Carlo, which lead to bias in their final

14:   estimates.

15:   % In a variant that ties the variational approximation

16:   %directly to the Markov chain, \gls{MSC} further provides a new

17:   %algorithm that melds \gls{VI} and \gls{MCMC}.

18:   We illustrate convergence on a toy

19:   model and demonstrate the utility of \gls{MSC} on Bayesian probit

20:   regression for classification

21: %  %, deep Markov models to learn the dynamics of simulated spiking neurons,

22:   as well as a stochastic volatility model for financial data.

23: \end{abstract}

24: