abstract:b4a4a88290a4bd29.tex

1: \begin{abstract}

2: We highlight a pitfall when applying stochastic variational inference to general Bayesian networks.

3: For global random variables approximated by an exponential family distribution,

4: natural gradient steps, commonly starting from a unit length step size, are averaged to convergence.

5: This useful insight into the scaling of initial step sizes is lost when the approximation factorizes across a general Bayesian network,

6: and care must be taken to ensure practical convergence.

7: We experimentally investigate how much of the baby (well-scaled steps) is thrown out with the bath water (exact gradients).

8: \end{abstract}