9c271e11e44c4c32.tex
1: \begin{abstract} Differential equations provide a framework to represent invertible transformations of measures, which are recently%
2: used extensively to represent complex
3: probability distributions (e.g., in generative modelling and density estimation). While such models have achieved enormous
4:   success in machine learning and data science, little is known about
5:   their statistical properties. In this work, we first establish a
6:   general statistical convergence theorem for distribution learning
7:   via ODE-parameterized transport maps that is applicable to any
8:   velocity field class $\mathcal{F}$ satisfying certain simple
9:   constraints. The proof of this general theorem combines analytical
10:   stability estimates of ODEs with classical empirical process theory
11:   for sieved M-estimators. Subsequently, we specialize the general
12:   theorem to $C^k$-smooth densities. We show the velocity field
13:   inherits the regularity from the target density, which enables
14:   considering estimation over a $C^k$ ball and obtaining concrete
15:   minimax convergence rate. Finally, we consider the setting of neural
16:   differential equations (neural ODEs), where $\mathcal{F}$ is
17:   parameterized by a neural network class. Applying our general
18:   theorem with classical NN approximation results and metric entropy
19:   rates, we obtain minimax convergence rates and show how the network
20:   size (e.g., width, depth, sparsity and norm constraints) should
21:   scale with sample size $n$.\ymtd{make some edits here}
22:   
23: \end{abstract}
24: