abstract:44b8f6b6fc74f4ae.tex

1: \begin{abstract}

2: Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence.

3: This paper introduces a provably robust clustering algorithm based on loss minimization that performs well on Gaussian mixture models with outliers.

4: It provides theoretical guarantees that the algorithm obtains high accuracy with high probability under certain assumptions.

5: Moreover, it can also be used as an initialization strategy for $k$-means clustering.

6: Experiments on real-world large-scale datasets demonstrate the effectiveness of the algorithm when clustering a large number of clusters, and a $k$-means algorithm initialized by the algorithm outperforms many of the classic clustering methods in both speed and accuracy, while scaling well to large datasets such as ImageNet.

7: \end{abstract}

8: