We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes. This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people. Moreover, the principal should be few-shot adaptable and minimize the number of interventions, because interventions are often costly. We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents with different learning strategies and reward functions. We validate this approach step-by-step. First, in a Stackelberg setting with a best-response agent, we show that meta-learning enables quick convergence to the theoretically known Stackelberg equilibrium at test time, although noisy observations severely increase the sample complexity. We then show that our model-based meta-learning approach is cost-effective in intervening on bandit agents with unseen explore-exploit strategies. Finally, we outperform baselines that use either meta-learning or agent behavior modeling, in both $0$-shot and $K=1$-shot settings with partial agent information.
翻译:我们研究一个监管者如何高效且有效地干预未知学习智能体的奖励以诱导期望结果。这适用于拍卖、税收等现实场景——监管者可能既不了解真实个体的学习行为,也无法获知他们的奖励函数。此外,监管者应具备小样本适应能力,并最小化干预次数(因干预往往成本高昂)。我们提出MERMAIDE——一种基于模型的元学习框架,用于训练能快速适应具有不同学习策略和奖励函数的分布外智能体的监管者。我们逐步验证该方法的有效性:首先,在最佳响应智能体参与的Stackelberg博弈中,尽管噪声观测显著增加了样本复杂度,元学习仍能使监管者在测试阶段快速收敛至理论已知的Stackelberg均衡;其次,面向采用未知探索-利用策略的多臂赌博机智能体,我们证明基于模型的元学习方法可经济高效地实施干预;最后,在仅有部分智能体信息的$0$次与$K=1$次小样本设置中,我们的方法均优于单纯使用元学习或智能体行为建模的基线模型。