abstract:44281aeaf57fefb6.tex

1: \begin{abstract}

2:   In this paper, we study the stochastic optimization problem from a continuous-time perspective. We propose a stochastic first-order algorithm, called Stochastic Gradient Descent with Momentum (SGDM), and show that the sequence $\{x_k\}$ generated by SGDM, despite its \emph{stochastic} nature, converges to a \emph{deterministic} second-order Ordinary Differential Equation (ODE) in $L_2$-norm, as the stepsize goes to zero. The connection between the ODE and the  algorithm results in delightful patterns in the discrete-time convergence analysis. More specifically, we develop convergence results for the ODE through a Lyapunov function, and translate the whole argument to the discrete-time case. This approach yields an $\widetilde{\mathcal{O}}\left(\frac{\log \frac{1}{\beta}}{\sqrt{k}}\right)$ anytime last-iterate convergence guarantee for SGDM, where $k$ is the number of iterates and $1-\beta$ is the desired success probability. Notably, the Lyapunov argument helps us to remove both the projection step and the bounded gradient assumption, and our algorithm does not rely on quantities that cannot be known in general. Additionally, we prove that a subsequence of $\{x_k\}$ generated by SGDM converges in expectation at rate $o\left(\frac{1}{\sqrt{k}\log\log k}\right)$. To the best of our knowledge, both of these results, enabled by the continuous-time perspective, improve existing works in the field.

3: \end{abstract}

4: