1: \begin{abstract}
2: Are two sets of observations drawn from the same distribution? This
3: problem is a two-sample test.
4: Kernel methods lead to many appealing properties. Indeed state-of-the-art
5: approaches use the $L^2$ distance between kernel-based
6: distribution representatives to derive their test statistics. Here, we show that
7: $L^p$ distances (with $p\geq 1$) between these
8: distribution representatives give metrics on the space of distributions that are
9: well-behaved to detect differences between distributions as they
10: metrize the weak convergence. Moreover, for analytic kernels,
11: we show that the $L^1$ geometry gives improved testing power for
12: scalable computational procedures. Specifically, we derive a finite
13: dimensional approximation of the metric given as the $\ell_1$ norm of a vector which captures differences of expectations of analytic functions evaluated at spatial locations or frequencies (i.e, features). The features can be chosen to
14: maximize the differences of the distributions and give interpretable
15: indications of how they differs. Using an $\ell_1$ norm gives better detection
16: because differences between representatives are dense
17: as we use analytic kernels (non-zero almost everywhere). The tests are consistent, while
18: much faster than state-of-the-art quadratic-time kernel-based tests. Experiments
19: on artificial
20: and real-world problems demonstrate
21: improved power/time tradeoff than the state of the art, based on
22: $\ell_2$ norms, and in some cases, better outright power than even the most
23: expensive quadratic-time tests. %This performance gain is retained
24: %even in high dimensions.
25: \end{abstract}
26: