f6ea039dc964606f.tex
1: \begin{abstract}%
2:   Consider a regression problem where the learner is given
3:   a large collection of $d$-dimensional data points, but can only
4:   query a small subset of the real-valued labels. How many queries are
5:   needed to obtain a $1+\epsilon$ relative error approximation of the
6:   optimum? While this problem has been extensively studied for least
7:   squares regression, little is known for other losses. An important
8:   example is least absolute deviation regression ($\ell_1$ regression)
9:   which enjoys superior robustness to outliers compared to least
10:   squares. We develop a new framework for analyzing importance
11:   sampling methods in regression problems, which enables us to show
12:   that the query complexity of 
13:   least absolute deviation regression is $\Theta(d/\epsilon^2)$ up to
14:   logarithmic factors. We further extend our techniques to show the
15:   first bounds on the query complexity for any $\ell_p$ loss with
16:   $p\in(1,2)$. As a key novelty in our analysis, we introduce the
17:   notion of \emph{\cuc}, which is a new approximation guarantee for the
18:   empirical loss. While it is inspired by uniform
19:   convergence in statistical learning, our approach additionally incorporates a correction
20:   term to avoid unnecessary variance due to outliers. This
21:   can be viewed as a new connection between statistical learning theory and
22:   variance reduction techniques in stochastic optimization,
23: which should be of
24:   independent interest.
25: \end{abstract}
26: