abstract:f6ea039dc964606f.tex

1: \begin{abstract}%

2:   Consider a regression problem where the learner is given

3:   a large collection of $d$-dimensional data points, but can only

4:   query a small subset of the real-valued labels. How many queries are

5:   needed to obtain a $1+\epsilon$ relative error approximation of the

6:   optimum? While this problem has been extensively studied for least

7:   squares regression, little is known for other losses. An important

8:   example is least absolute deviation regression ($\ell_1$ regression)

9:   which enjoys superior robustness to outliers compared to least

10:   squares. We develop a new framework for analyzing importance

11:   sampling methods in regression problems, which enables us to show

12:   that the query complexity of

13:   least absolute deviation regression is $\Theta(d/\epsilon^2)$ up to

14:   logarithmic factors. We further extend our techniques to show the

15:   first bounds on the query complexity for any $\ell_p$ loss with

16:   $p\in(1,2)$. As a key novelty in our analysis, we introduce the

17:   notion of \emph{\cuc}, which is a new approximation guarantee for the

18:   empirical loss. While it is inspired by uniform

19:   convergence in statistical learning, our approach additionally incorporates a correction

20:   term to avoid unnecessary variance due to outliers. This

21:   can be viewed as a new connection between statistical learning theory and

22:   variance reduction techniques in stochastic optimization,

23: which should be of

24:   independent interest.

25: \end{abstract}

26: