1: \begin{abstract}
2: Stochastic gradient algorithms have been the main focus of large-scale learning problems and
3: they led to important successes in machine learning. The convergence of SGD depends on the
4: careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients.
5: In this paper, we propose a new adaptive learning rate algorithm, which utilizes curvature information
6: for automatically tuning the learning rates. The information about the element-wise
7: curvature of the loss function is estimated from the local statistics of the stochastic first order
8: gradients. We further propose a new variance reduction technique to speed up the convergence. In our preliminary experiments with deep
9: neural networks, we obtained better performance compared to the popular stochastic gradient algorithms.
10: \end{abstract}
11: