abstract:69b66af8413f23b6.tex

1: \begin{abstract}

2:   We present a framework to define a large class of

3:   neural networks for which, by construction, training

4:   by gradient flow provably reaches arbitrarily low loss

5:   when the number of parameters grows.

6:   Distinct from the fixed-space

7:   global optimality of non-convex optimization,

8:   this new form of convergence,

9:   and the techniques introduced

10:   to prove such convergence,

11:   pave the way for a usable deep learning convergence theory

12:   in the near future,

13:   without overparameterization assumptions relating

14:   the number of parameters and training samples.

15:   We define these architectures

16:   from a simple computation graph

17:   and a mechanism to lift it, thus increasing the number of parameters,

18:   generalizing the idea of increasing the widths of multi-layer perceptrons.

19:   We show that architectures similar to most common

20:   deep learning models are present in this class,

21:   obtained by sparsifying the weight tensors of usual architectures at initialization.

22:   Leveraging tools of algebraic topology

23:   and random graph theory,

24:   we use the computation graph's geometry

25:   to propagate properties guaranteeing

26:   convergence to any precision for these large sparse models.

27: \end{abstract}

28: