abstract:c5a76d4865dde3bb.tex

1: \begin{abstract}

2:     Stochastic gradient algorithms have been the main focus of large-scale learning problems and

3:     they led to important successes in machine learning. The convergence of SGD depends on the

4:     careful choice of learning rate and the amount of the noise in stochastic estimates of the gradients.

5:     In this paper, we propose a new adaptive learning rate algorithm, which utilizes curvature information

6:     for automatically tuning the learning rates. The information about the element-wise

7:     curvature of the loss function is estimated from the local statistics of the stochastic first order

8:     gradients. We further propose a new variance reduction technique to speed up the convergence. In our preliminary experiments with deep

9:     neural networks, we obtained better performance compared to the popular stochastic gradient algorithms.

10: \end{abstract}

11: