1: \begin{abstract}
2: Classification problems using deep learning have been shown to have a high-curvature subspace in the loss landscape equal in dimension to the number of classes.
3: Moreover, this subspace corresponds to the subspace spanned by the logit gradients for each class.
4: An obvious strategy to speed up optimisation would be to use Newton's method in the high-curvature subspace and stochastic gradient descent in the co-space.
5: We show that a naive implementation actually slows down convergence and we speculate why this might be.
6: \end{abstract}
7: