1: \begin{abstract} Differential equations provide a framework to represent invertible transformations of measures, which are recently%
2: used extensively to represent complex
3: probability distributions (e.g., in generative modelling and density estimation). While such models have achieved enormous
4: success in machine learning and data science, little is known about
5: their statistical properties. In this work, we first establish a
6: general statistical convergence theorem for distribution learning
7: via ODE-parameterized transport maps that is applicable to any
8: velocity field class $\mathcal{F}$ satisfying certain simple
9: constraints. The proof of this general theorem combines analytical
10: stability estimates of ODEs with classical empirical process theory
11: for sieved M-estimators. Subsequently, we specialize the general
12: theorem to $C^k$-smooth densities. We show the velocity field
13: inherits the regularity from the target density, which enables
14: considering estimation over a $C^k$ ball and obtaining concrete
15: minimax convergence rate. Finally, we consider the setting of neural
16: differential equations (neural ODEs), where $\mathcal{F}$ is
17: parameterized by a neural network class. Applying our general
18: theorem with classical NN approximation results and metric entropy
19: rates, we obtain minimax convergence rates and show how the network
20: size (e.g., width, depth, sparsity and norm constraints) should
21: scale with sample size $n$.\ymtd{make some edits here}
22:
23: \end{abstract}
24: