1: \begin{abstract}
2: A line of recent work has analyzed the behavior of the
3: Expectation-Maximization (EM) algorithm in the well-specified setting,
4: in which the population likelihood is locally strongly concave around
5: its maximizing argument. Examples include suitably separated Gaussian
6: mixture models and mixtures of linear regressions. We consider
7: over-specified settings in which the number of fitted components is
8: larger than the number of components in the true distribution. Such
9: mis-specified settings can lead to singularity in the Fisher
10: information matrix, and moreover, the maximum likelihood estimator
11: based on $n$ i.i.d. samples in $d$ dimensions can have a non-standard
12: $\mathcal{O}((d/n)^{\frac{1}{4}})$ rate of convergence. Focusing on
13: the simple setting of two-component mixtures fit to a $d$-dimensional
14: Gaussian distribution, we study the behavior of the EM algorithm both
15: when the mixture weights are different (unbalanced case), and are
16: equal (balanced case). Our analysis reveals a sharp distinction
17: between these two cases: in the former, the EM algorithm converges
18: geometrically to a point at Euclidean distance of
19: $\mathcal{O}((d/n)^{\frac{1}{2}})$ from the true parameter, whereas in
20: the latter case, the convergence rate is exponentially slower, and the
21: fixed point has a much lower $\mathcal{O}((d/n)^{\frac{1}{4}})$
22: accuracy. Analysis of this singular case requires the introduction of
23: some novel techniques: in particular, we make use of a careful form of
24: localization in the associated empirical process, and develop a
25: recursive argument to progressively sharpen the statistical rate.
26: \end{abstract}
27: