1: \begin{abstract}
2: We study the gradient Expectation-Maximization (EM) algorithm for Gaussian Mixture Models (GMM) in the over-parameterized setting, where a general GMM with $n>1$ components learns from data that are generated by a single ground truth Gaussian distribution.
3: While results for the special case of 2-Gaussian mixtures are well-known, a general global convergence analysis for arbitrary $n$ remains unresolved and faces several new technical barriers since the convergence becomes sub-linear and non-monotonic.
4: To address these challenges, we construct a novel likelihood-based convergence analysis framework and rigorously prove that gradient EM converges globally with a sublinear rate $O(1/\sqrt{t})$. This is the first global convergence result for Gaussian mixtures with more than $2$ components.
5: The sublinear convergence rate is due to the algorithmic nature of learning over-parameterized GMM with gradient EM. We also identify a new emerging technical challenge for learning general over-parameterized GMM: the existence of bad local regions that can trap gradient EM for an exponential number of steps.
6:
7: %providing the first global convergence proof of gradient EM beyond the special case of $n=2$.
8: % We further present a matching $\Omega(1/\sqrt{T})$ convergence rate lower bound, implying the tightness of our convergence bound and revealing a slow-down effect on convergence caused by over-parameterization.
9: %This result shows that gradient EM converges exponentially slower in the over-parameterized setting, compared to the exact parameterized setting where it enjoys a $\exp(-\Omega(T))$ linear convergence rate.
10: % Our proof of the convergence rate lower bound is based on another novel potential function and explains that the cause of the slower convergence rate is %due to
11: % additional degrees of freedom in the parameter space introduced by over-parameterization. %\mf{maybe remove the rest of this sentence, might sound too simplistic:}
12: \end{abstract}
13: