492bea70d6955eea.tex
1: \begin{abstract}
2:   Online \textit{k}-means algorithms are a simple and fundamental class of algorithms for \textit{k}-means clustering problem. These algorithms maintains $k$ centers, say $W_1,\ldots, W_k \in \mathbb{R}^d$ with updates that (i) receive a data point $X$, (ii) find the closest center $W_{i^*}$ among $W_1,\ldots, W_k$, and then (iii) update $W_{i^*}$ using $X$. Given the simplicity of these algorithms, its surprising that the long-term global behavior of this procedure is unknown when it is applied to a never-ending stream of data points that is drawn from an underlying distribution $p$ on $\mathbb{R}^d$.
3:   
4:   In this work, we prove the first asymptotic convergence result for a general class of $k$-means algorithms performed over streaming data from a distribution. In particular, we show that online $k$-means algorithms over a distribution can be interpreted as stochastic gradient descent with a stochastic learning rate schedule and that the centers asymptotically converge to the set of stationary points of the $k$-means objective function. Our main technical tool, which we believe is of independent interest, is an extension of the seminal convergence result from \citep{bertsekas2000gradient} to \emph{non-uniform} center-specific learning rates in online k-means algorithm that may depend on the past trajectory of the centers.
5: \end{abstract}
6: