abstract:022e97127b0c770e.tex

1: \begin{abstract}

2: Are two sets of observations drawn from the same distribution? This

3: problem is a two-sample test.

4: Kernel methods lead to many appealing properties. Indeed state-of-the-art

5: approaches use the $L^2$ distance between kernel-based

6: distribution representatives to derive their test statistics. Here, we show that

7: $L^p$ distances (with $p\geq 1$) between these

8: distribution representatives give metrics on the space of distributions that are

9: well-behaved to detect differences between distributions as they

10: metrize the weak convergence. Moreover, for analytic kernels,

11: we show that the $L^1$ geometry gives improved testing power for

12: scalable computational procedures. Specifically, we derive a finite

13: dimensional approximation of the metric given as the $\ell_1$ norm of a vector which captures differences of expectations of analytic functions evaluated at spatial locations or frequencies (i.e, features). The features can be chosen to

14: maximize the differences of the distributions and give interpretable

15: indications of how they differs. Using an $\ell_1$ norm gives better detection

16: because differences between representatives are dense

17: as we use analytic kernels (non-zero almost everywhere). The tests are consistent, while

18: much faster than state-of-the-art quadratic-time kernel-based tests. Experiments

19: on artificial

20: and real-world problems demonstrate

21: improved power/time tradeoff than the state of the art, based on

22: $\ell_2$ norms, and in some cases, better outright power than even the most

23: expensive quadratic-time tests. %This performance gain is retained

24: %even in high dimensions.

25: \end{abstract}

26: