1: \begin{abstract}
2: A recurring theme in statistical learning, online learning, and beyond is that faster convergence rates are possible for problems
3: with low noise, often quantified by the performance of the best
4: hypothesis; such results are known as \emph{first-order} or
5: \emph{small-loss} guarantees. While first-order guarantees are
6: relatively well understood in statistical and online learning, adapting to low noise in \emph{contextual bandits} (and more broadly, decision making) presents major algorithmic challenges. In
7: a COLT 2017 open problem,~\citet{agarwal2017open} asked whether
8: first-order guarantees are even possible for contextual bandits
9: and---if so---whether they can be attained by efficient
10: algorithms. We give a resolution to this question by providing an optimal and
11: efficient reduction from contextual bandits to online regression
12: with the logarithmic (or, cross-entropy) loss. Our algorithm is
13: simple and practical, readily accommodates rich function classes,
14: and requires no distributional assumptions beyond realizability. In
15: a large-scale empirical evaluation, we find that our approach
16: typically outperforms comparable non-first-order methods.
17:
18: On the technical side, we show that the logarithmic loss and an information-theoretic quantity called the
19: \emph{triangular discrimination} play a fundamental role in obtaining first-order
20: guarantees,
21: and we combine this observation with %
22: new refinements
23: to the regression oracle reduction framework of \citet{foster2020beyond}.
24: The use of triangular discrimination yields novel results even for the classical statistical learning model, and we anticipate that it will find broader use.
25:
26: %
27: \end{abstract}
28: