abstract:1e11fb6f69a3d5e4.tex

1: \begin{abstract}

2: The optimization problem behind neural networks is highly non-convex. Training with stochastic gradient

3: descent and variants requires careful parameter tuning and provides no guarantee to achieve the global

4: optimum. In contrast we show under quite weak assumptions on the data that a particular class of feedforward

5: neural networks can be trained globally optimal with a linear convergence rate with our nonlinear spectral method. Up to our knowledge this is

6: the first practically feasible method which achieves such a guarantee. While the method can in principle be

7: applied to deep networks, we restrict ourselves for simplicity in this paper to one and two hidden layer networks.

8: Our experiments confirm that these models are rich enough to achieve good performance on a series

9: of real-world datasets.

10: \end{abstract}

11: