553d9502aafe4052.tex
1: \begin{abstract}
2: Issued from Optimal Transport, the Wasserstein distance has gained importance in Machine Learning due to its appealing geometrical properties and the increasing availability of efficient approximations.
3: It owes its recent ubiquity in generative modelling and variational inference to its ability to cope with distributions having non overlapping support.
4: In this work, we consider the problem of estimating the Wasserstein distance between two probability distributions when observations are polluted by outliers. To that end, we investigate how to leverage a Medians of Means (MoM) approach to provide robust estimates. %Median of Means has been recognized as a versatile tool for robust mean estimation: it substitutes the median of means computed on a partition of the sample to the empirical mean.
5: Exploiting the dual Kantorovitch formulation of the Wasserstein distance, we introduce and discuss novel MoM-based robust estimators whose consistency is studied under a data contamination model and for which convergence rates are provided. Beyond computational issues, the choice of the partition size, \textit{i.e.,} the unique parameter of theses robust estimators, is investigated in numerical experiments. Furthermore, these MoM estimators make Wasserstein Generative Adversarial Network (WGAN) robust to outliers, as witnessed by an empirical study on two benchmarks CIFAR10 and Fashion MNIST.
6: %Then,  we turn to
7: %Eventually, we discuss how to combine MoM with the entropy-regularized approximation of the Wasserstein distance and propose a simple MoM-based re-weighting scheme that could be used in conjunction with the Sinkhorn algorithm.
8: 
9: \end{abstract}
10: