1: \begin{abstract}
2: We present a framework to define a large class of
3: neural networks for which, by construction, training
4: by gradient flow provably reaches arbitrarily low loss
5: when the number of parameters grows.
6: Distinct from the fixed-space
7: global optimality of non-convex optimization,
8: this new form of convergence,
9: and the techniques introduced
10: to prove such convergence,
11: pave the way for a usable deep learning convergence theory
12: in the near future,
13: without overparameterization assumptions relating
14: the number of parameters and training samples.
15: We define these architectures
16: from a simple computation graph
17: and a mechanism to lift it, thus increasing the number of parameters,
18: generalizing the idea of increasing the widths of multi-layer perceptrons.
19: We show that architectures similar to most common
20: deep learning models are present in this class,
21: obtained by sparsifying the weight tensors of usual architectures at initialization.
22: Leveraging tools of algebraic topology
23: and random graph theory,
24: we use the computation graph's geometry
25: to propagate properties guaranteeing
26: convergence to any precision for these large sparse models.
27: \end{abstract}
28: