c5a76d4865dde3bb.tex
1: \begin{abstract}
2:     Stochastic gradient algorithms have been the main focus of large-scale learning problems and 
3:     they led to important successes in machine learning. The convergence of SGD depends on the
4:     careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients. 
5:     In this paper, we propose a new adaptive learning rate algorithm, which utilizes curvature information 
6:     for automatically tuning the learning rates. The information about the element-wise 
7:     curvature of the loss function is estimated from the local statistics of the stochastic first order
8:     gradients. We further propose a new variance reduction technique to speed up the convergence. In our preliminary experiments with deep
9:     neural networks, we obtained better performance compared to the popular stochastic gradient algorithms.
10: \end{abstract}
11: