3b3c90a895eafd76.tex
1: \begin{abstract}
2: Recent studies of learning algorithms have shown that there is a regime with an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization of the eigenvalue near the maximum value which allows convergence (edge of stability). We consider a class of predictive models that are quadratic in the parameters, which we call second-order regression models. This is in contrast with the neural tangent kernel regime, where the predictive function is linear in the parameters. For quadratic objectives in two dimensions, we prove that this second order regression model exhibits both progressive sharpening and edge of stability behavior. We then show that in higher dimensions, the model shows this behavior generically without the structure of a neural network, due to a non-linearity induced in the learning dynamics.
3: This suggests that progressive sharpening and edge of stability aren't uniquely
4: features of neural network models, and could be a more general property of learning in high-dimensional, 
5: non-linear models.
6: Finally, we show that edge of stability behavior in neural networks is correlated with the behavior in quadratic regression models.
7: \end{abstract}
8: