1: \begin{abstract}
2: Robust density estimation refers to the consistent estimation of the density function even when the data is contaminated by outliers. We find that existing forest density estimation at a certain point is inherently resistant to the outliers outside the cells containing the point, which we call \textit{non-local outliers}, but not resistant to the rest \textit{local outliers}.
3: To achieve robustness against all outliers, we propose an ensemble learning algorithm called \textit{medians of forests for robust density estimation} (\textit{MFRDE}), which adopts a pointwise median operation on forest density estimators fitted on subsampled datasets.
4: Compared to exsiting robust kernel-based methods, MFRDE enables us to choose larger subsampling sizes, sacrificing less accuracy for density estimation while achieving robustness.
5: On the theoretical side, we introduce the local outlier exponent to quantify the number of local outliers. Under this exponent, we show that even if the number of outliers reaches a certain polynomial order in the sample size, MFRDE is able to achieve almost the same convergence rate as the same algorithm on uncontaminated data, whereas robust kernel-based methods fail.
6: On the practical side, real data experiments show that MFRDE outperforms existing robust kernel-based methods.
7: Moreover, we apply MFRDE to anomaly detection to showcase a further application.
8:
9: \medskip
10: \noindent {\bf Keywords:} density estimation; robust statistics; random forest; median of means \\
11: \end{abstract}
12: