28dae3bcce18c5d2.tex
1: \begin{abstract}
2:   Learning algorithms for implicit generative models can optimize a
3:   variety of criteria that measure how the data distribution differs
4:   from the implicit model distribution, including the Wasserstein
5:   distance, the Energy distance, and the Maximum Mean Discrepancy
6:   criterion. A careful look at the geometries induced by these
7:   distances on the space of probability measures reveals interesting
8:   differences. In particular, we can establish surprising approximate
9:   global convergence guarantees for the $1$-Wasserstein distance, even
10:   when the parametric generator has a nonconvex parametrization.
11: \end{abstract}
12: