abstract:731dc9e1ac578fa3.tex

1: \begin{abstract}

2: Spherical $k$-Means is frequently used to cluster document collections because it performs reasonably well in many settings and is computationally efficient.

3: However, the time complexity increases linearly with the number of clusters $k$, which limits the suitability of the algorithm for larger values of $k$ depending on the size of the collection.

4: Optimizations targeted at the Euclidean $k$-Means algorithm largely do not apply because the cosine distance is not a metric.

5: We therefore propose an efficient indexing structure to improve the scalability of Spherical $k$-Means with respect to $k$.

6: Our approach exploits the sparsity of the input vectors and the convergence behavior of $k$-Means to reduce the number of comparisons on each iteration significantly.

7: \end{abstract}

8: