abstract:ea79a12789d2c497.tex

1: \begin{abstract}We consider the paradigm of unsupervised anomaly detection, which involves the identification of anomalies within a dataset in the absence of labeled examples.

2: Though distance-based methods are top-performing for unsupervised anomaly detection, they suffer heavily from the sensitivity to the choice of the number of the nearest neighbors.

3: In this paper, we propose a new distance-based algorithm called \textit{bagged regularized $k$-distances for anomaly detection} (\textit{BRDAD}) converting the unsupervised anomaly detection problem into a convex optimization problem.

4: Our BRDAD algorithm selects the weights by minimizing the \textit{surrogate risk}, i.e., the finite sample bound of the empirical risk of the \textit{bagged weighted $k$-distances for density estimation} (\textit{BWDDE}).

5: This approach enables us to successfully address the sensitivity challenge of the hyperparameter choice in distance-based algorithms.

6: Moreover, when dealing with large-scale datasets, the efficiency issues can be addressed by the incorporated bagging technique in our BRDAD algorithm.

7: On the theoretical side, we establish fast convergence rates of the AUC regret of our algorithm and demonstrate that the bagging technique significantly reduces the computational complexity.

8: On the practical side, we conduct numerical experiments on anomaly detection benchmarks to illustrate the insensitivity of parameter selection of our algorithm compared with other state-of-the-art distance-based methods. Moreover, promising improvements are brought by applying the bagging technique in our algorithm on real-world datasets.

9: \end{abstract}

10: