d073d11e349c1958.tex
1: \begin{abstract}
2:   We introduce a family of stochastic optimization methods based on the Runge--Kutta--Chebyshev (RKC) schemes. The RKC methods are explicit methods originally designed for solving stiff ordinary differential equations by ensuring that their stability regions are of maximal size.
3:   In the optimization context, this allows for larger step sizes (learning rates) and better robustness compared to e.g.\ the popular stochastic gradient descent method.
4:   Our main contribution is a convergence proof for essentially all stochastic Runge--Kutta optimization methods. This shows convergence in expectation with an optimal sublinear rate under standard assumptions of strong convexity and Lipschitz-continuous gradients. For non-convex objectives, we get convergence to zero in expectation of the gradients. The proof
5:   requires certain natural conditions on the Runge--Kutta coefficients, and we further demonstrate that the RKC schemes satisfy these.
6:   Finally, we illustrate the improved stability properties of the methods in practice by performing numerical experiments on both a small-scale test example and on a problem arising from an image classification application in machine learning.
7: \end{abstract}
8: