c4be2109ebde44c3.tex
1: \begin{abstract}
2: We consider non-parametric density estimation in the framework of local approximate differential privacy.
3: In contrast to centralized privacy scenarios with a trusted curator, in the local setup anonymization must be guaranteed already on the individual data owners' side and therefore must precede any data mining tasks.
4: Thus, the published anonymized data should be compatible with as many statistical procedures as possible.
5: We suggest adding Laplace noise and Gaussian processes (both appropriately scaled) to kernel density estimators to obtain approximate differential private versions of the latter ones.
6: We obtain minimax type results over Sobolev classes indexed by a smoothness parameter $s>1/2$ for the mean squared error at a fixed point.
7: In particular, we show that taking the average of private kernel density estimators from $n$ different data owners attains the optimal rate of convergence if the bandwidth parameter is correctly specified.
8: Notably, the optimal convergence rate in terms of the sample size $n$ is $n^{-(2s-1)/(2s+1)}$ under local differential privacy and thus deteriorated to the rate $n^{-(2s-1)/(2s)}$ which holds without privacy restrictions.
9: Since the optimal choice of the bandwidth parameter depends on the smoothness $s$ and is thus not accessible in practice, adaptive methods for bandwidth selection are necessary and must, in the local privacy framework, be performed directly on the anonymized data.
10: We address this problem by means of a variant of Lepski's method tailored to the privacy setup and obtain general oracle inequalities for private kernel density estimators.
11: In the Sobolev case, the resulting adaptive estimator attains the optimal rate of convergence at least up to extra logarithmic factors.
12: \end{abstract}
13: