abstract:c8ccb22d52f31798.tex

1: \begin{abstract}

2:     Stochastic gradient algorithms are the main focus of large-scale optimization problems and led to

3:     important successes in the recent advancement of the deep learning algorithms. The convergence

4:     of SGD depends on the careful choice of learning rate and the amount of the noise in

5:     stochastic estimates of the gradients. In this paper, we propose an adaptive learning rate

6:     algorithm, which utilizes stochastic curvature information of the loss function for

7:     automatically tuning the learning rates. The information about the element-wise curvature of

8:     the loss function is estimated from the local statistics of the stochastic first order

9:     gradients. We further propose a new variance reduction technique to speed up the convergence.

10:     In our experiments with deep neural networks, we obtained better performance compared to the

11:     popular stochastic gradient algorithms. \footnote{This paper is an extension/update of our

12:     previous paper \cite{gulcehre2014adasecant}.}

13: \end{abstract}

14: