1: \begin{abstract}
2: We propose a hierarchical training algorithm for standard feed-forward neural networks that adaptively extends the network architecture as soon as the optimization reaches a stationary point. By solving small (low-dimensional) optimization problems, the extended network provably escapes any local minimum or stationary point. Under some assumptions on the approximability of the data with \emph{stable} neural networks, we show that the algorithm achieves an optimal convergence rate $s$ in the sense that ${\rm loss}\lesssim \#{\rm parameters}^{-s}$. As a byproduct, we obtain computable indicators which judge the optimality of the training state of a given network and derive a new notion of generalization error.
3: \end{abstract}
4: