abstract:43995be950707794.tex

1: \begin{abstract}

2:  A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems

3:   with low noise, often quantified by the performance of the best

4:   hypothesis; such results are known as \emph{first-order} or

5:   \emph{small-loss} guarantees. While first-order guarantees are

6:   relatively well understood in statistical and online learning, adapting to low noise in \emph{contextual bandits} (and more broadly, decision making) presents major algorithmic challenges. In

7:   a COLT 2017 open problem,~\citet{agarwal2017open} asked whether

8:   first-order guarantees are even possible for contextual bandits

9:   and---if so---whether they can be attained by efficient

10:   algorithms. We give a resolution to this question by providing an optimal and

11:   efficient reduction from contextual bandits to online regression

12:   with the logarithmic (or, cross-entropy) loss. Our algorithm is

13:   simple and practical, readily accommodates rich function classes,

14:   and requires no distributional assumptions beyond realizability. In

15:   a large-scale empirical evaluation, we find that our approach

16:   typically outperforms  comparable non-first-order methods.

17:

18: On the technical side, we show that the logarithmic loss and an information-theoretic quantity called the

19: \emph{triangular discrimination} play a fundamental role in obtaining first-order

20:                        guarantees,

21: and we combine this observation with %

22: new refinements

23: to the regression oracle reduction framework of \citet{foster2020beyond}.

24: The use of triangular discrimination yields novel results even for the classical statistical learning model, and we anticipate that it will find broader use.

25:

26: %

27: \end{abstract}

28: