5ebda6022032b15a.tex
1: \begin{abstract}
2:     Classification problems using deep learning have been shown to have a high-curvature subspace in the loss landscape equal in dimension to the number of classes.
3:     Moreover, this subspace corresponds to the subspace spanned by the logit gradients for each class.
4:     An obvious strategy to speed up optimisation would be to use Newton's method in the high-curvature subspace and stochastic gradient descent in the co-space.
5:     We show that a naive implementation actually slows down convergence and we speculate why this might be.
6: \end{abstract}
7: