44b8f6b6fc74f4ae.tex
1: \begin{abstract}
2: Clustering is a widely used technique with a long and rich history in a variety of areas. However, most existing algorithms do not scale well to large datasets, or are missing theoretical guarantees of convergence. 
3: This paper introduces a provably robust clustering algorithm based on loss minimization that performs well on Gaussian mixture models with outliers. 
4: It provides theoretical guarantees that the algorithm obtains high accuracy with high probability under certain assumptions. 
5: Moreover, it can also be used as an initialization strategy for $k$-means clustering. 
6: Experiments on real-world large-scale datasets demonstrate the effectiveness of the algorithm when clustering a large number of clusters, and a $k$-means algorithm initialized by the algorithm outperforms many of the classic clustering methods in both speed and accuracy, while scaling well to large datasets such as ImageNet.
7: \end{abstract}
8: