1: \begin{abstract}
2: Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to
3: important successes in the recent advancement of the deep learning algorithms. The convergence
4: of SGD depends on the careful choice of learning rate and the amount of the noise in
5: stochastic estimates of the gradients. In this paper, we propose an adaptive learning rate
6: algorithm, which utilizes stochastic curvature information of the loss function for
7: automatically tuning the learning rates. The information about the element-wise curvature of
8: the loss function is estimated from the local statistics of the stochastic first order
9: gradients. We further propose a new variance reduction technique to speed up the convergence.
10: In our experiments with deep neural networks, we obtained better performance compared to the
11: popular stochastic gradient algorithms. \footnote{This paper is an extension/update of our
12: previous paper \cite{gulcehre2014adasecant}.}
13: \end{abstract}
14: