abstract:a4373f107942296d.tex

1: \begin{abstract}

2:

3:     Stochastic convex optimization is one of the most well-studied models for learning in modern machine learning. Nevertheless, a central fundamental question in this setup remained unresolved: \begin{quote}

4: {\em how many data points must be observed so that any

5:     empirical risk minimizer (ERM) shows good performance on the true population?}

6:     \end{quote}

7:     This question was proposed by Feldman %\citet{feldman2016generalization}, that was able to show

8:     who proved that $\Omega(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ data points are necessary (where~$d$ is the dimension and $\eps>0$ is the accuracy parameter). Proving an

9:     $\omega(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ lower bound was left as an open problem.

10:     In this work we show that in fact $\tilde O(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ data points are also

11:     sufficient.

12: This settles the question and yields a new separation between ERMs and uniform convergence.

13:

14:

15:    % We resolve an open problem by \citet{feldman2016generalization}. and provide the minmax rates for the sample complexity of Empirical Risk Minimization (ERMs) in the setup of Stochastic Covnex Optimization (SCO).

16:

17:

18:     This sample complexity holds for the classical setup of learning bounded convex Lipschitz functions over the Euclidean unit ball. We further generalize the result and show that a similar upper bound holds for all symmetric convex bodies.

19:     The general bound is composed of two terms: (i) a

20:     term of the form $\tilde O(\frac{d}{\epsilon})$ with an inverse-linear dependence on the accuracy parameter, and (ii) a term that depends on the statistical complexity of the class of \emph{linear} functions (captured by the Rademacher complexity).

21:     The proof builds a mechanism for controlling the behavior of stochastic convex optimization problems.

22:

23: \end{abstract}

24: