1: \begin{abstract}
2: Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms avoid so-called strict saddle points of the loss function.
3: However, in many modern machine learning applications, the required regularity conditions are not satisfied.
4: In particular, this is the case for rectified linear unit (ReLU) networks.
5: In this paper, we prove a variant of the relevant dynamical systems result, a center-stable manifold theorem, in which we relax some of the regularity requirements.
6: Then, we verify that shallow ReLU networks fit into the new framework.
7: Building on a classification of critical points of the square integral loss of shallow ReLU networks measured against an affine target function, we deduce that gradient descent avoids most saddle points.
8: We proceed to prove convergence to global minima if the initialization is sufficiently good, which is expressed by an explicit threshold on the limiting loss.
9: \end{abstract}
10: