731dc9e1ac578fa3.tex
1: \begin{abstract}
2: Spherical $k$-Means is frequently used to cluster document collections because it performs reasonably well in many settings and is computationally efficient.
3: However, the time complexity increases linearly with the number of clusters $k$, which limits the suitability of the algorithm for larger values of $k$ depending on the size of the collection.
4: Optimizations targeted at the Euclidean $k$-Means algorithm largely do not apply because the cosine distance is not a metric.
5: We therefore propose an efficient indexing structure to improve the scalability of Spherical $k$-Means with respect to $k$.
6: Our approach exploits the sparsity of the input vectors and the convergence behavior of $k$-Means to reduce the number of comparisons on each iteration significantly.
7: \end{abstract}
8: