abstract:76f0b421bd76546f.tex

1: \begin{abstract}

2: We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.

3: Existing frameworks such as mean field theory and neural tangent kernel theory

4: for neural network optimization analysis

5: typically require taking limit of infinite width of the network to show its global convergence.

6: This potentially makes it difficult to directly deal with finite width network; especially in the neural tangent kernel regime, we cannot reveal favorable properties of neural networks beyond kernel methods.

7: To realize more natural analysis, we consider a completely different approach in which

8: we formulate the parameter training as a transportation map estimation and show its global convergence via the theory of the {\it infinite dimensional Langevin dynamics}.

9: This enables us to analyze narrow and wide networks in a unifying manner.

10: Moreover, we give generalization gap and excess risk bounds for the solution obtained by the dynamics.

11: The excess risk bound achieves the so-called fast learning rate.

12: In particular, we show an exponential convergence for a classification problem and a minimax optimal rate for a regression problem.

13:

14:

15: %for deep learning optimization theory

16: %not only the global optimality but also a nice generalization ability of trained network.

17: %Indeed, we present generalization gap and excess risk bounds.

18:

19: \end{abstract}