a9540c7eaf20ac24.tex
1: \begin{abstract}
2: 
3: We give a highly efficient ``semi-agnostic'' algorithm
4: for learning univariate probability
5: distributions that are well approximated by piecewise polynomial density
6: functions.  Let $p$ be an arbitrary distribution over an interval $I$
7: which is $\tau$-close (in total variation distance)
8: to an unknown probability distribution $q$ that is
9: defined by an unknown partition of $I$ into $t$ intervals and $t$
10: unknown degree-$d$ polynomials specifying $q$ over each of the intervals.
11: We give an algorithm that draws $\tilde{O}(t\new{(d+1)}/\eps^2)$ samples
12: from $p$, runs in time $\poly(t,d,1/\eps)$, and with high
13: probability outputs a piecewise polynomial hypothesis distribution $h$ that 
14: is $(O(\tau)+\eps)$-close (in total variation distance) to $p$.
15: \new{This sample complexity is essentially optimal; we show that even
16: for $\tau=0$, any 
17: algorithm that learns an unknown $t$-piecewise degree-$d$
18: probability distribution over $I$ to accuracy $\eps$ must 
19: use $\Omega({\frac {t(d+1)} {\poly(1 + \log(d+1))}} \cdot {\frac 1 {\eps^2}})$ samples
20: from the distribution, regardless of its running time.}
21: Our algorithm combines tools from approximation theory, uniform convergence,
22: linear programming, and dynamic programming.
23: 
24: We apply this general algorithm to obtain a wide range of 
25: results for many natural problems in density estimation
26: over both continuous and discrete domains.
27: These include state-of-the-art results for learning 
28: mixtures of log-concave distributions; mixtures of $t$-modal
29: distributions; mixtures of Monotone Hazard Rate distributions;
30: mixtures of Poisson Binomial Distributions; mixtures of
31: Gaussians; and mixtures of $k$-monotone densities.  
32: Our general technique
33: yields computationally efficient algorithms for all these problems,
34: in many cases with provably optimal sample complexities
35: (up to logarithmic factors) in all parameters.
36: 
37: 
38: \end{abstract}
39: