1: \begin{abstract}
2: Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood.
3: We introduce and study a second-order stochastic quasi-Gauss--Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss--Newton methods, and variance reduction to address this problem.
4: SQGN provides excellent accuracy without the need for experimenting with many hyper-parameter configurations, which is often computationally prohibitive given the number of combinations and the cost of each training process.
5: We discuss the implementation of SQGN with TensorFlow, and we compare its convergence and computational performance to selected first-order methods using the MNIST benchmark and a large-scale seismic tomography application from Earth science.
6: \end{abstract}
7: