43995be950707794.tex
1: \begin{abstract}
2:  A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems
3:   with low noise, often quantified by the performance of the best
4:   hypothesis; such results are known as \emph{first-order} or
5:   \emph{small-loss} guarantees. While first-order guarantees are
6:   relatively well understood in statistical and online learning, adapting to low noise in \emph{contextual bandits} (and more broadly, decision making) presents major algorithmic challenges. In
7:   a COLT 2017 open problem,~\citet{agarwal2017open} asked whether
8:   first-order guarantees are even possible for contextual bandits
9:   and---if so---whether they can be attained by efficient
10:   algorithms. We give a resolution to this question by providing an optimal and
11:   efficient reduction from contextual bandits to online regression
12:   with the logarithmic (or, cross-entropy) loss. Our algorithm is
13:   simple and practical, readily accommodates rich function classes,
14:   and requires no distributional assumptions beyond realizability. In
15:   a large-scale empirical evaluation, we find that our approach
16:   typically outperforms  comparable non-first-order methods.
17: 
18: On the technical side, we show that the logarithmic loss and an information-theoretic quantity called the
19: \emph{triangular discrimination} play a fundamental role in obtaining first-order
20:                        guarantees,
21: and we combine this observation with %
22: new refinements
23: to the regression oracle reduction framework of \citet{foster2020beyond}. 
24: The use of triangular discrimination yields novel results even for the classical statistical learning model, and we anticipate that it will find broader use.
25: 
26: %
27: \end{abstract}
28: