1: \begin{abstract}
2: Optimal transport distances are powerful tools to compare probability distributions and have found many applications in machine learning. Yet their algorithmic complexity prevents their direct use on large scale datasets. To overcome this challenge, practitioners compute these distances on minibatches {\em i.e.} they average the outcome of several smaller optimal transport problems. We propose in this paper an analysis of this practice, which effects are not well understood so far. We notably argue that it is equivalent to an implicit regularization of the original problem, with appealing properties such as unbiased estimators, gradients and a concentration bound around the expectation, but also with defects such as loss of distance property. Along with this theoretical analysis, we also conduct empirical experiments on gradient flows, GANs or color transfer that highlight the practical interest of this strategy.
3: %convergence in population independent from data space dimensionality
4: %Wasserstein distance has a cubical complexity and it makes it computationally challenging. To overcome this challenge, practitioners rely on computing the Wasserstein distance on minibatches. While it replaces the original problem, it has been effective in practice for domain adaptation and generative modeling tasks. In this paper we propose a deeper study of the minibatch optimal transport paradigm. We show that it comes with inherited properties from OT and has a mass spread behavior similar to regularized Wasserstein distance variants. It also has properties which are not shared with OT losses: unbiased estimator, asymptotic convergence without dependence on dimension and unbiased gradients. But using minibatches induce a bias which break the first distance axiom. To study their behavior, we considered gradient flow experiments on the CelebA dataset, a toy GAN example and color transfert experiments between 1M pixel images.
5: \end{abstract}
6: