0751ab101155bda2.tex
1: \begin{abstract}
2:     We introduce a (de)-regularization of the Maximum Mean Discrepancy
3: (DrMMD) and its Wasserstein gradient flow. 
4: Existing gradient flows that transport samples from source distribution to target distribution with only target samples, either lack tractable numerical implementation ($f$-divergence flows) or require strong assumptions, and modifications such as noise injection, to ensure convergence (Maximum Mean Discrepancy flows).
5: In contrast, DrMMD flow can simultaneously (i) guarantee near-global convergence for a broad class of targets in both continuous and discrete time, and (ii) be implemented in closed form using only samples.
6: The former is achieved by leveraging the connection between the
7: DrMMD and the $\chi^2$-divergence, while the latter comes by treating DrMMD as MMD with a de-regularized kernel. 
8: Our numerical scheme uses an adaptive de-regularization schedule throughout the flow to optimally trade off between discretization errors and deviations from the $\chi^2$ regime.
9: The potential application of the DrMMD flow is demonstrated across several numerical experiments, including a large-scale setting of training student/teacher networks.
10: \end{abstract}
11: