abstract:acb89d6b6f441adb.tex

1: \begin{abstract}

2: % Empirical risk minimization is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning. In practice, we increasingly encounter adaptively collected data, such as the result of running a contextual bandit algorithm.

3: Empirical risk minimization (ERM) is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning, but its model-agnostic guarantees can fail when we use adaptively collected data, such as the result of running a contextual bandit algorithm.

4: We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class and provide first-of-their-kind generalization guarantees and fast convergence rates.

5: Our results are based on a new maximal inequality that carefully leverages the importance sampling structure to obtain rates with the right dependence on the exploration rate in the data.

6: For regression, we provide fast rates that leverage the strong convexity of squared-error loss.

7: For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero, as is the case for bandit-collected data.

8: \later{We also provide guarantees for model selection using an adaptive leave-one-out cross-validation.}

9: An empirical investigation validates our theory.

10: \end{abstract}

11: