abstract:4ec46dba2a4b79a1.tex

1: \begin{abstract}

2: We propose a simple and efficient clustering method for

3: high-dimensional data with a large number of clusters.

4: Our algorithm achieves high-performance by evaluating distances of

5: datapoints with a subset of the cluster centres.

6: Our contribution is substantially more efficient than k-means as

7: it does not require an all to all comparison of data points and clusters.

8: We show that the optimal solutions of our approximation are the same

9: as in the exact solution. However, our approach is considerably more

10: efficient at extracting these clusters compared to the state-of-the-art.

11: We compare our approximation with the exact k-means and alternative

12: approximation approaches on a series of standardised clustering tasks.

13: For the evaluation, we consider the algorithmic complexity, including

14: number of operations to convergence, and the stability of the results.

15: An efficient implementation of the algorithm is available \href{https://github.com/ooub/peregrine}{online}.

16: \end{abstract}

17: