abstract:9d08690546cbcbb7.tex

1: \begin{abstract}

2: The current interpretation of stochastic gradient descent (SGD) as a stochastic process lacks generality in that its numerical scheme restricts continuous-time dynamics as well as the loss function and the distribution of gradient noise.

3: We introduce a simplified scheme with milder conditions that flexibly interprets SGD as a discrete-time approximation of an It\^o process.

4: The scheme also works as a common foundation of SGD and stochastic gradient Langevin dynamics (SGLD), providing insights into their asymptotic properties.

5: We investigate the convergence of SGD with biased gradient in terms of the equilibrium mode and the overestimation problem of the second moment of SGLD.

6: \end{abstract}

7: