abstract:e00332e575ce3db2.tex

1: \begin{abstract}

2: Deep learning researchers commonly suggest

3: that converged models are stuck in local minima.

4: More recently, some researchers observed

5: that under reasonable assumptions,

6: the vast majority of critical points are saddle points, not true minima.

7: Both descriptions suggest that weights converge around  a point in weight space, be it a local optima or merely a critical point.

8: However, it's possible that neither interpretation is accurate.

9: As neural networks are typically over-complete,

10: it's easy to show the existence of  vast  continuous regions through weight space with equal loss.

11: In this paper, we build on recent work empirically characterizing the error surfaces of neural networks.

12: We analyze training paths through weight space,

13: presenting evidence that apparent convergence of loss

14: does not correspond to weights arriving at critical points,

15: but instead to large movements through flat regions of weight space.

16: While it's trivial to show that neural network error surfaces are globally non-convex,

17: we show that error surfaces are also locally non-convex, even after breaking symmetry with a random initialization and also after partial training.

18: \end{abstract}

19: