abstract:85f1f7054330c555.tex

1: \begin{abstract}

2: 	Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms avoid so-called strict saddle points of the loss function.

3: However, in many modern machine learning applications, the required regularity conditions are not satisfied.

4: In particular, this is the case for rectified linear unit (ReLU) networks.

5: In this paper, we prove a variant of the relevant dynamical systems result, a center-stable manifold theorem, in which we relax some of the regularity requirements.

6: Then, we verify that shallow ReLU networks fit into the new framework.

7: Building on a classification of critical points of the square integral loss of shallow ReLU networks measured against an affine target function, we deduce that gradient descent avoids most saddle points.

8: We proceed to prove convergence to global minima if the initialization is sufficiently good, which is expressed by an explicit threshold on the limiting loss.

9: \end{abstract}

10: