1: \begin{abstract}
2:
3: Stochastic convex optimization is one of the most well-studied models for learning in modern machine learning. Nevertheless, a central fundamental question in this setup remained unresolved: \begin{quote}
4: {\em how many data points must be observed so that any
5: empirical risk minimizer (ERM) shows good performance on the true population?}
6: \end{quote}
7: This question was proposed by Feldman %\citet{feldman2016generalization}, that was able to show
8: who proved that $\Omega(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ data points are necessary (where~$d$ is the dimension and $\eps>0$ is the accuracy parameter). Proving an
9: $\omega(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ lower bound was left as an open problem.
10: In this work we show that in fact $\tilde O(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ data points are also
11: sufficient.
12: This settles the question and yields a new separation between ERMs and uniform convergence.
13:
14:
15: % We resolve an open problem by \citet{feldman2016generalization}. and provide the minmax rates for the sample complexity of Empirical Risk Minimization (ERMs) in the setup of Stochastic Covnex Optimization (SCO).
16:
17:
18: This sample complexity holds for the classical setup of learning bounded convex Lipschitz functions over the Euclidean unit ball. We further generalize the result and show that a similar upper bound holds for all symmetric convex bodies.
19: The general bound is composed of two terms: (i) a
20: term of the form $\tilde O(\frac{d}{\epsilon})$ with an inverse-linear dependence on the accuracy parameter, and (ii) a term that depends on the statistical complexity of the class of \emph{linear} functions (captured by the Rademacher complexity).
21: The proof builds a mechanism for controlling the behavior of stochastic convex optimization problems.
22:
23: \end{abstract}
24: