abstract:9e9e26776efbf852.tex

1: \begin{abstract}

2: In stochastic convex optimization the goal is to minimize a convex function $F(x) \doteq \E_{\bff\sim D}[\bff(x)]$ over a convex set $\K \subset \R^d$ where $D$ is some unknown distribution and each $f(\cdot)$ in the support of $D$ is convex over $\K$. The optimization is commonly based on i.i.d.~samples $f^1,f^2,\ldots,f^n$ from $D$.

3: A standard approach to such problems is empirical risk minimization (ERM) that optimizes $F_S(x) \doteq \fr{n}\sum_{i\leq n} f^i(x)$. Here we consider the question of how many samples are necessary for ERM to succeed and the closely related question of uniform convergence of $F_S$ to $F$ over $\K$. We demonstrate that in the standard $\ell_p/\ell_q$ setting of Lipschitz-bounded functions over a $\K$ of bounded radius, ERM requires sample size that scales linearly with the dimension $d$. This nearly matches standard upper bounds and improves on $\Omega(\log d)$ dependence proved for $\ell_2/\ell_2$ setting in \cite{SSSSS:2009}. In stark contrast, these problems can be solved using dimension-independent number of samples for $\ell_2/\ell_2$ setting and $\log d$ dependence for $\ell_1/\ell_\infty$ setting using other approaches.

4:

5: We further show that our lower bound applies even if the functions in the support of $D$ are smooth and efficiently computable and even if an $\ell_1$ regularization term is added. Finally, we demonstrate that for a more general class of bounded-range (but not Lipschitz-bounded) stochastic convex programs an infinite gap appears already in dimension 2.

6: \end{abstract}

7: