We consider the problem of steering no-regret-learning agents to play desirable equilibria in extensive-form games via nonnegative payments. We show that steering is impossible if the total budget (across iterations) is finite. However, with average, realized payments converging to zero, we show that steering is possible. In the full-feedback setting, that is, when players' full strategies are observed at each timestep, it is possible with constant per-iteration payments. In the bandit-feedback setting, that is, when only trajectories through the game tree are observable, steering is impossible with constant per-iteration payments but possible if we allow the maximum per-iteration payment to grow with time, while maintaining the property that average, realized payments vanish. We supplement our theoretical positive results with experiments highlighting the efficacy of steering in large, extensive-form games, and show how our framework relates to optimal mechanism design and information design.
翻译:我们研究通过非负支付引导无遗憾学习主体在扩展式博弈中实现理想均衡的问题。研究表明,若总预算(跨迭代)有限,则引导不可能实现。然而,当平均实现支付趋近于零时,我们证明引导是可能的。在全反馈设定下(即每个时间步可观测到玩家的完整策略),使用恒定每轮支付即可实现引导。在赌博机反馈设定下(即仅能观测到博弈树中的轨迹),恒定每轮支付无法实现引导,但若允许每轮最大支付随时间增长,同时保持平均实现支付收敛于零的特性,则引导成为可能。我们通过实验补充了理论正面结果,展示在大型扩展式博弈中引导的有效性,并阐释该框架与最优机制设计及信息设计之间的关联。