ea1d1a835a15f94d.tex
1: \begin{abstract}
2:   It is well-known that the reparameterisation gradient estimator, which exhibits low variance in practice, is biased for non-differentiable models. This may compromise correctness of gradient-based optimisation methods such as stochastic gradient descent (SGD).
3:   We introduce a simple syntactic framework to define non-differentiable functions piecewisely and present a systematic approach to obtain smoothings for which the reparameterisation gradient estimator is unbiased.
4:   Our main contribution is a novel variant of SGD, \emph{Diagonalisation Stochastic Gradient Descent}, which progressively enhances the accuracy of the smoothed approximation \emph{during} optimisation, and we prove convergence to stationary points of the \emph{unsmoothed} (original) objective.
5:   Our empirical evaluation reveals benefits over the state of the art: our approach is simple, fast, stable and attains orders of magnitude reduction in work-normalised variance.
6: \end{abstract}
7: