e8b91c1825d8b488.tex
1: \begin{abstract}
2: In this paper, we study convergence properties of the gradient Expectation-Maximization algorithm~\cite{lange1995gradient} for Gaussian Mixture Models for general number of clusters and mixing coefficients. We derive the convergence rate depending on the mixing coefficients, minimum and maximum pairwise distances between the true centers and dimensionality and number of components; and obtain a near-optimal local contraction radius. While there have been some recent notable works that derive local convergence rates for EM in the two equal mixture symmetric GMM, in the more general case, the derivations need structurally different and non-trivial arguments. We use recent tools from learning theory and empirical processes to achieve our theoretical results.%Then we translate  start from the population case, where one can have infinite number of observations, and show the convergence rate depends on the imbalance of mixing coefficients, and get an almost optimal local contraction radius. Furthermore, in the sample version, we use recent tool in Rademacher averages and achieve a non-trivial bound for the statistical error in gradient EM.
3: \end{abstract}
4: