abstract:27fb1ac32d9a6c47.tex

1: \begin{abstract}

2: We study how a \planner{} can efficiently and effectively intervene on the rewards of a previously unseen \emph{learning} agent in order to induce desirable outcomes.

3: This is relevant to many real-world settings like auctions or taxation, where the \planner{} may not know the learning behavior nor the rewards of real people.

4: Moreover, the \planner{} should be few-shot adaptable and minimize the number of interventions, because interventions are often costly.

5: We introduce \oursolution{}, a model-based meta-learning framework to train a \planner{} that can quickly adapt to out-of-distribution agents with different learning strategies and reward functions.

6: We validate this approach step-by-step.

7: First, in a Stackelberg setting with a best-response agent, we show that meta-learning enables quick convergence to the theoretically known Stackelberg equilibrium at test time, although noisy observations severely increase the sample complexity.

8: We then show that our model-based meta-learning approach is cost-effective in intervening on bandit agents with unseen explore-exploit strategies.

9: Finally, we outperform baselines that use either meta-learning or agent behavior modeling, in both $0$-shot and $1$-shot settings with partial agent information.

10: \end{abstract}

11: