e00332e575ce3db2.tex
1: \begin{abstract}
2: Deep learning researchers commonly suggest 
3: that converged models are stuck in local minima.
4: More recently, some researchers observed 
5: that under reasonable assumptions, 
6: the vast majority of critical points are saddle points, not true minima.
7: Both descriptions suggest that weights converge around  a point in weight space, be it a local optima or merely a critical point.
8: However, it's possible that neither interpretation is accurate.
9: As neural networks are typically over-complete,
10: it's easy to show the existence of  vast  continuous regions through weight space with equal loss.
11: In this paper, we build on recent work empirically characterizing the error surfaces of neural networks.
12: We analyze training paths through weight space,
13: presenting evidence that apparent convergence of loss
14: does not correspond to weights arriving at critical points, 
15: but instead to large movements through flat regions of weight space.
16: While it's trivial to show that neural network error surfaces are globally non-convex, 
17: we show that error surfaces are also locally non-convex, even after breaking symmetry with a random initialization and also after partial training.
18: \end{abstract}
19: