abstract:c0704f2fdf063eef.tex

1: \begin{abstract}

2:   We consider the problem of identifying the parameters of an unknown

3:   mixture of two arbitrary $d$-dimensional gaussians from a sequence

4:   of independent random samples. Our main results are upper and lower

5:   bounds giving a computationally efficient moment-based estimator

6:   with an optimal convergence rate, thus resolving a problem

7:   introduced by Pearson (1894). Denoting by $\sigma^2$ the variance of

8:   the unknown mixture, we prove that $\Theta(\sigma^{12})$ samples are

9:   necessary and sufficient to estimate each parameter up to constant

10:   additive error when $d=1.$ Our upper bound extends to arbitrary

11:   dimension~$d>1$ up to a (provably necessary) logarithmic loss in~$d$

12:   using a novel---yet simple---dimensionality reduction technique. We

13:   further identify several interesting special cases where the sample

14:   complexity is notably smaller than our optimal worst-case bound. For

15:   instance, if the means of the two components are separated by

16:   $\Omega(\sigma)$ the sample complexity reduces to $O(\sigma^2)$ and

17:   this is again optimal.

18:

19:   Our results also apply to learning each component of the mixture up

20:   to small error in total variation distance, where our algorithm

21:   gives strong improvements in sample complexity over previous work.

22:   We also extend our lower bound to mixtures of $k$ Gaussians, showing

23:   that $\Omega(\sigma^{6k-2})$ samples are necessary to estimate each

24:   parameter up to constant additive error.

25:

26:   % Feldman Servedio Odonnell axis-aligned?  Should probably cite

27:   % them.

28:

29:

30:

31:

32: %

33: %

34: %  Strikingly, our estimator turns out to be very similar to the one

35: %  Pearson proposed in 1894 which reduces the one-dimensional problem

36: %  to solving and analyzing a tractable system of polynomial

37: %  equations.

38: %

39: %Our result greatly improves on the exponent in the sample

40: %size of the best previous estimator due to Kalai, Moitra and Valiant

41: %  (2010).

42:

43: \end{abstract}

44: