abstract:7cb0be5870c909ac.tex

1: \begin{abstract}

2: We analyze online \cite{BottouBengio} and mini-batch \cite{Sculley} $k$-means variants. Both scale up the widely used Lloyd's algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised feature learning.

3: We show, for the first time, that they have global convergence towards ``local optima'' at rate $O(\frac{1}{t})$ under general conditions.

4: In addition, we show if the dataset is clusterable, with suitable initialization, mini-batch $k$-means converges to an optimal $k$-means solution at rate $O(\frac{1}{t})$ with high probability.

5: The $k$-means objective is non-convex and non-differentiable: we exploit ideas from non-convex gradient-based optimization by providing a novel characterization of the trajectory of $k$-means algorithm on its solution space, and circumvent its non-differentiability via geometric insights about $k$-means update.

6: %We combine ideas from  to analyze the NP-hard $k$-means clustering problem, which has a non-convex and non-differentiable objective.

7: %Our analysis resembles that of stochastic gradient descent, while circumventing the non-differentiability problem via geometric insights of $k$-means update.

8: %Combining techniques from stochastic approximation with a precise characterization of the trajectory of batch Lloyd's algorithm on the solution space,

9: \end{abstract}

10: