1: \begin{abstract}%
2: Consider a regression problem where the learner is given
3: a large collection of $d$-dimensional data points, but can only
4: query a small subset of the real-valued labels. How many queries are
5: needed to obtain a $1+\epsilon$ relative error approximation of the
6: optimum? While this problem has been extensively studied for least
7: squares regression, little is known for other losses. An important
8: example is least absolute deviation regression ($\ell_1$ regression)
9: which enjoys superior robustness to outliers compared to least
10: squares. We develop a new framework for analyzing importance
11: sampling methods in regression problems, which enables us to show
12: that the query complexity of
13: least absolute deviation regression is $\Theta(d/\epsilon^2)$ up to
14: logarithmic factors. We further extend our techniques to show the
15: first bounds on the query complexity for any $\ell_p$ loss with
16: $p\in(1,2)$. As a key novelty in our analysis, we introduce the
17: notion of \emph{\cuc}, which is a new approximation guarantee for the
18: empirical loss. While it is inspired by uniform
19: convergence in statistical learning, our approach additionally incorporates a correction
20: term to avoid unnecessary variance due to outliers. This
21: can be viewed as a new connection between statistical learning theory and
22: variance reduction techniques in stochastic optimization,
23: which should be of
24: independent interest.
25: \end{abstract}
26: