71f552f9d4dc6952.tex
1: \begin{abstract}
2: Robust density estimation refers to the consistent estimation of the density function even when the data is contaminated by outliers. We find that existing forest density estimation at a certain point is inherently resistant to the outliers outside the cells containing the point, which we call \textit{non-local outliers}, but not resistant to the rest \textit{local outliers}. 
3: To achieve robustness against all outliers, we propose an ensemble learning algorithm called \textit{medians of forests for robust density estimation} (\textit{MFRDE}), which adopts a pointwise median operation on forest density estimators fitted on subsampled datasets. 
4: Compared to exsiting robust kernel-based methods, MFRDE enables us to choose larger subsampling sizes, sacrificing less accuracy for density estimation while achieving robustness. 
5: On the theoretical side, we introduce the local outlier exponent to quantify the number of local outliers. Under this exponent, we show that even if the number of outliers reaches a certain polynomial order in the sample size, MFRDE is able to achieve almost the same convergence rate as the same algorithm on uncontaminated data, whereas robust kernel-based methods fail.
6: On the practical side, real data experiments show that MFRDE outperforms existing robust kernel-based methods. 
7: Moreover, we apply MFRDE to anomaly detection to showcase a further application.
8: 
9: \medskip
10: \noindent  {\bf Keywords:} density estimation; robust statistics; random forest; median of means \\
11: \end{abstract}
12: