a4373f107942296d.tex
1: \begin{abstract}
2: 
3:     Stochastic convex optimization is one of the most well-studied models for learning in modern machine learning. Nevertheless, a central fundamental question in this setup remained unresolved: \begin{quote}
4: {\em how many data points must be observed so that any 
5:     empirical risk minimizer (ERM) shows good performance on the true population?}    
6:     \end{quote}
7:     This question was proposed by Feldman %\citet{feldman2016generalization}, that was able to show
8:     who proved that $\Omega(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ data points are necessary (where~$d$ is the dimension and $\eps>0$ is the accuracy parameter). Proving an
9:     $\omega(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ lower bound was left as an open problem. 
10:     In this work we show that in fact $\tilde O(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ data points are also
11:     sufficient. 
12: This settles the question and yields a new separation between ERMs and uniform convergence.
13: 
14:     
15:    % We resolve an open problem by \citet{feldman2016generalization}. and provide the minmax rates for the sample complexity of Empirical Risk Minimization (ERMs) in the setup of Stochastic Covnex Optimization (SCO). 
16:     
17: 
18:     This sample complexity holds for the classical setup of learning bounded convex Lipschitz functions over the Euclidean unit ball. We further generalize the result and show that a similar upper bound holds for all symmetric convex bodies.
19:     The general bound is composed of two terms: (i) a 
20:     term of the form $\tilde O(\frac{d}{\epsilon})$ with an inverse-linear dependence on the accuracy parameter, and (ii) a term that depends on the statistical complexity of the class of \emph{linear} functions (captured by the Rademacher complexity).
21:     The proof builds a mechanism for controlling the behavior of stochastic convex optimization problems. 
22:     
23: \end{abstract}
24: