abstract:5ebda6022032b15a.tex

1: \begin{abstract}

2:     Classification problems using deep learning have been shown to have a high-curvature subspace in the loss landscape equal in dimension to the number of classes.

3:     Moreover, this subspace corresponds to the subspace spanned by the logit gradients for each class.

4:     An obvious strategy to speed up optimisation would be to use Newton's method in the high-curvature subspace and stochastic gradient descent in the co-space.

5:     We show that a naive implementation actually slows down convergence and we speculate why this might be.

6: \end{abstract}

7: