1: \begin{abstract}
2: We provide the first convergence guarantee for black-box variational inference (BBVI) with the reparameterization gradient.
3: While preliminary investigations worked on simplified versions of BBVI (\textit{e.g.}, bounded domain, bounded support, only optimizing for the scale, and such), our setup does not need any such algorithmic modifications.
4: Our results hold for log-smooth posterior densities with and without strong log-concavity and the location-scale variational family.
5: Notably, our analysis reveals that certain algorithm design choices commonly employed in practice, such as nonlinear parameterizations of the scale matrix, can result in suboptimal convergence rates.
6: Fortunately, running BBVI with proximal stochastic gradient descent fixes these limitations and thus achieves the strongest known convergence guarantees.
7: We evaluate this theoretical insight by comparing proximal SGD against other standard implementations of BBVI on large-scale Bayesian inference problems.
8: \end{abstract}
9: