c8ccb22d52f31798.tex
1: \begin{abstract}
2:     Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to
3:     important successes in the recent advancement of the deep learning algorithms. The convergence
4:     of SGD depends on the careful choice of learning rate and the amount of the noise in
5:     stochastic estimates of the gradients. In this paper, we propose an adaptive learning rate
6:     algorithm, which utilizes stochastic curvature information of the loss function for
7:     automatically tuning the learning rates. The information about the element-wise curvature of
8:     the loss function is estimated from the local statistics of the stochastic first order
9:     gradients. We further propose a new variance reduction technique to speed up the convergence.
10:     In our experiments with deep neural networks, we obtained better performance compared to the
11:     popular stochastic gradient algorithms. \footnote{This paper is an extension/update of our
12:     previous paper \cite{gulcehre2014adasecant}.}
13: \end{abstract}
14: