1: \begin{abstract}
2: We highlight a pitfall when applying stochastic variational inference to general Bayesian networks.
3: For global random variables approximated by an exponential family distribution,
4: natural gradient steps, commonly starting from a unit length step size, are averaged to convergence.
5: This useful insight into the scaling of initial step sizes is lost when the approximation factorizes across a general Bayesian network,
6: and care must be taken to ensure practical convergence.
7: We experimentally investigate how much of the baby (well-scaled steps) is thrown out with the bath water (exact gradients).
8: \end{abstract}