69b66af8413f23b6.tex
1: \begin{abstract}
2:   We present a framework to define a large class of
3:   neural networks for which, by construction, training
4:   by gradient flow provably reaches arbitrarily low loss
5:   when the number of parameters grows.
6:   Distinct from the fixed-space
7:   global optimality of non-convex optimization,
8:   this new form of convergence,
9:   and the techniques introduced
10:   to prove such convergence,
11:   pave the way for a usable deep learning convergence theory
12:   in the near future,
13:   without overparameterization assumptions relating
14:   the number of parameters and training samples.
15:   We define these architectures
16:   from a simple computation graph
17:   and a mechanism to lift it, thus increasing the number of parameters,
18:   generalizing the idea of increasing the widths of multi-layer perceptrons.
19:   We show that architectures similar to most common
20:   deep learning models are present in this class,
21:   obtained by sparsifying the weight tensors of usual architectures at initialization.
22:   Leveraging tools of algebraic topology
23:   and random graph theory,
24:   we use the computation graph's geometry
25:   to propagate properties guaranteeing
26:   convergence to any precision for these large sparse models.
27: \end{abstract}
28: