89fd0cd549563f6c.tex
1: \begin{abstract}
2: Subsampling methods aim to select a subsample as a surrogate for the observed sample.
3: Such methods have been used pervasively in large-scale data analytics, active learning, and privacy-preserving analysis in recent decades.
4: Instead of model-based methods, in this paper, we study model-free subsampling methods, which aim to identify a subsample that is not confined by model assumptions.
5: Existing model-free subsampling methods are usually built upon clustering techniques or kernel tricks. 
6: Most of these methods suffer from either a large computational burden or a theoretical weakness.
7: In particular, the theoretical weakness is that the empirical distribution of the selected subsample may not necessarily converge to the population distribution.
8: Such computational and theoretical limitations hinder the broad applicability of model-free subsampling methods in practice.
9: We propose a novel model-free subsampling method by utilizing optimal transport techniques. 
10: Moreover, we develop an efficient subsampling algorithm that is adaptive to the unknown probability density function.
11: Theoretically, we show the selected subsample can be used for efficient density estimation by deriving the convergence rate for the proposed subsample kernel density estimator.
12: We also provide the optimal bandwidth for the proposed estimator.
13: Numerical studies on synthetic and real-world datasets demonstrate the performance of the proposed method is superior.
14: \end{abstract}
15: