abstract:9c271e11e44c4c32.tex

1: \begin{abstract} Differential equations provide a framework to represent invertible transformations of measures, which are recently%

2: used extensively to represent complex

3: probability distributions (e.g., in generative modelling and density estimation). While such models have achieved enormous

4:   success in machine learning and data science, little is known about

5:   their statistical properties. In this work, we first establish a

6:   general statistical convergence theorem for distribution learning

7:   via ODE-parameterized transport maps that is applicable to any

8:   velocity field class $\mathcal{F}$ satisfying certain simple

9:   constraints. The proof of this general theorem combines analytical

10:   stability estimates of ODEs with classical empirical process theory

11:   for sieved M-estimators. Subsequently, we specialize the general

12:   theorem to $C^k$-smooth densities. We show the velocity field

13:   inherits the regularity from the target density, which enables

14:   considering estimation over a $C^k$ ball and obtaining concrete

15:   minimax convergence rate. Finally, we consider the setting of neural

16:   differential equations (neural ODEs), where $\mathcal{F}$ is

17:   parameterized by a neural network class. Applying our general

18:   theorem with classical NN approximation results and metric entropy

19:   rates, we obtain minimax convergence rates and show how the network

20:   size (e.g., width, depth, sparsity and norm constraints) should

21:   scale with sample size $n$.\ymtd{make some edits here}

22:

23: \end{abstract}

24: