1d945cf1a7f7c96c.tex
1: \begin{abstract}
2:    Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood.
3:    We introduce and study a second-order stochastic quasi-Gauss--Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss--Newton methods, and variance reduction to address this problem.
4:    SQGN provides excellent accuracy without the need for experimenting with many hyper-parameter configurations, which is often computationally prohibitive given the number of combinations and the cost of each training process. 
5:    We discuss the implementation of SQGN with TensorFlow, and we compare its convergence and computational performance to selected first-order methods using the MNIST benchmark and a large-scale seismic tomography application from Earth science.
6: \end{abstract}
7: