abstract:1d945cf1a7f7c96c.tex

1: \begin{abstract}

2:    Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood.

3:    We introduce and study a second-order stochastic quasi-Gauss--Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss--Newton methods, and variance reduction to address this problem.

4:    SQGN provides excellent accuracy without the need for experimenting with many hyper-parameter configurations, which is often computationally prohibitive given the number of combinations and the cost of each training process.

5:    We discuss the implementation of SQGN with TensorFlow, and we compare its convergence and computational performance to selected first-order methods using the MNIST benchmark and a large-scale seismic tomography application from Earth science.

6: \end{abstract}

7: