abstract:fe965c3ca1d846b9.tex

1: \begin{abstract}

2:   Stochastic Gradient Descent (SGD) is an important algorithm in

3:   machine learning.  With constant learning rates, it is a stochastic

4:   process that, after an initial phase of convergence, generates

5:   samples from a stationary distribution.  We show that SGD with

6:   constant rates can be effectively used as an approximate posterior

7:   inference algorithm for probabilistic modeling.  Specifically, we show

8:   how to adjust the tuning parameters of SGD such as to match the

9:   resulting stationary distribution to the posterior.  This analysis

10:   rests on interpreting SGD as a continuous-time stochastic process

11:   and then minimizing the Kullback-Leibler divergence between its

12:   stationary distribution and the target posterior.  (This is in the

13:   spirit of variational inference.)  In more detail, we model SGD as a

14:   multivariate Ornstein-Uhlenbeck process and then use properties of

15:   this process to derive the optimal parameters.  This theoretical framework also connects

16:   SGD to modern scalable inference algorithms; we analyze the recently

17:   proposed stochastic gradient Fisher scoring under this perspective.

18:   We demonstrate that SGD with properly chosen constant rates gives a new way to

19:   optimize hyperparameters in probabilistic models.

20:

21:

22: \end{abstract}

23: