71faf3518ba792ca.tex
1: \begin{abstract}
2: Incorporating second order curvature information in gradient based methods have shown to improve convergence drastically despite its computational intensity. 
3: %Further, studies on stochastic adaptations of the quasi-Newton methods in limited memory have gained interests. 
4: In this paper, we propose a stochastic (online) quasi-Newton method with Nesterov's accelerated gradient in both its full and limited memory forms for solving large scale non-convex optimization problems in neural networks. 
5: %Direction normalization has been introduced to improve stabilty.
6: %Further we also include direction normalization to improve stability. %
7: The performance of the proposed algorithm is evaluated in Tensorflow on benchmark classification and regression problems. The results show  improved performance compared to the classical second order oBFGS and oLBFGS methods and popular first order stochastic methods such as SGD and Adam. The performance with different momentum rates and batch sizes have also been illustrated.
8: %The abstract should briefly summarize the contents of the paper in
9: %150--250 words.
10: 
11: \keywords{Neural networks \and stochastic method \and online training \and Nesterov's accelerated gradient \and quasi-Newton method \and  limited memory \and Tensorflow}
12: %Stochastic optimization, Nesterov's accelerated gradient, quasi-Newton method, limited memory, Tensorflow 
13: \end{abstract}
14: