1: \begin{abstract}
2: We introduce a new theoretical framework to analyze deep learning optimization with connection to its generalization error.
3: Existing frameworks such as mean field theory and neural tangent kernel theory
4: for neural network optimization analysis
5: typically require taking limit of infinite width of the network to show its global convergence.
6: This potentially makes it difficult to directly deal with finite width network; especially in the neural tangent kernel regime, we cannot reveal favorable properties of neural networks beyond kernel methods.
7: To realize more natural analysis, we consider a completely different approach in which
8: we formulate the parameter training as a transportation map estimation and show its global convergence via the theory of the {\it infinite dimensional Langevin dynamics}.
9: This enables us to analyze narrow and wide networks in a unifying manner.
10: Moreover, we give generalization gap and excess risk bounds for the solution obtained by the dynamics.
11: The excess risk bound achieves the so-called fast learning rate.
12: In particular, we show an exponential convergence for a classification problem and a minimax optimal rate for a regression problem.
13:
14:
15: %for deep learning optimization theory
16: %not only the global optimality but also a nice generalization ability of trained network.
17: %Indeed, we present generalization gap and excess risk bounds.
18:
19: \end{abstract}