4ec46dba2a4b79a1.tex
1: \begin{abstract} 
2: We propose a simple and efficient clustering method for
3: high-dimensional data with a large number of clusters. 
4: Our algorithm achieves high-performance by evaluating distances of 
5: datapoints with a subset of the cluster centres. 
6: Our contribution is substantially more efficient than k-means as 
7: it does not require an all to all comparison of data points and clusters.
8: We show that the optimal solutions of our approximation are the same 
9: as in the exact solution. However, our approach is considerably more 
10: efficient at extracting these clusters compared to the state-of-the-art.
11: We compare our approximation with the exact k-means and alternative 
12: approximation approaches on a series of standardised clustering tasks. 
13: For the evaluation, we consider the algorithmic complexity, including
14: number of operations to convergence, and the stability of the results.
15: An efficient implementation of the algorithm is available \href{https://github.com/ooub/peregrine}{online}.
16: \end{abstract}
17: