1: \begin{abstract}
2: % Empirical risk minimization is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning. In practice, we increasingly encounter adaptively collected data, such as the result of running a contextual bandit algorithm.
3: Empirical risk minimization (ERM) is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning, but its model-agnostic guarantees can fail when we use adaptively collected data, such as the result of running a contextual bandit algorithm.
4: We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class and provide first-of-their-kind generalization guarantees and fast convergence rates.
5: Our results are based on a new maximal inequality that carefully leverages the importance sampling structure to obtain rates with the right dependence on the exploration rate in the data.
6: For regression, we provide fast rates that leverage the strong convexity of squared-error loss.
7: For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero, as is the case for bandit-collected data.
8: \later{We also provide guarantees for model selection using an adaptive leave-one-out cross-validation.}
9: An empirical investigation validates our theory.
10: \end{abstract}
11: