abstract:89fd0cd549563f6c.tex

1: \begin{abstract}

2: Subsampling methods aim to select a subsample as a surrogate for the observed sample.

3: Such methods have been used pervasively in large-scale data analytics, active learning, and privacy-preserving analysis in recent decades.

4: Instead of model-based methods, in this paper, we study model-free subsampling methods, which aim to identify a subsample that is not confined by model assumptions.

5: Existing model-free subsampling methods are usually built upon clustering techniques or kernel tricks.

6: Most of these methods suffer from either a large computational burden or a theoretical weakness.

7: In particular, the theoretical weakness is that the empirical distribution of the selected subsample may not necessarily converge to the population distribution.

8: Such computational and theoretical limitations hinder the broad applicability of model-free subsampling methods in practice.

9: We propose a novel model-free subsampling method by utilizing optimal transport techniques.

10: Moreover, we develop an efficient subsampling algorithm that is adaptive to the unknown probability density function.

11: Theoretically, we show the selected subsample can be used for efficient density estimation by deriving the convergence rate for the proposed subsample kernel density estimator.

12: We also provide the optimal bandwidth for the proposed estimator.

13: Numerical studies on synthetic and real-world datasets demonstrate the performance of the proposed method is superior.

14: \end{abstract}

15: