We consider the problem of steering no-regret-learning agents to play desirable equilibria via nonnegative payments. We first show that steering is impossible if the total budget (across all iterations) is finite, both in normal- and extensive-form games. However, we establish that vanishing average payments are compatible with steering. In particular, when players' full strategies are observed at each timestep, we show that constant per-iteration payments permit steering. In the more challenging setting where only trajectories through the game tree are observable, we show that steering is impossible with constant per-iteration payments in general extensive-form games, but possible in normal-form games or if the maximum per-iteration payment may grow with time. We supplement our theoretical positive results with experiments highlighting the efficacy of steering in large games, and show how our framework relates to optimal mechanism design and information design.
翻译:我们研究通过非负支付引导无遗憾学习智能体达到理想均衡的问题。首先证明,无论是在标准式博弈还是扩展式博弈中,若总预算(跨所有迭代)有限,则引导是不可能的。然而,我们证实消失的平均支付与引导是兼容的。具体而言,当每轮博弈观察到玩家的完整策略时,我们证明恒定迭代支付可实现引导。在更具挑战性的场景中(仅能观测到博弈树中的轨迹),我们证明对于一般扩展式博弈,恒定迭代支付无法实现引导,但在标准式博弈中或当最大迭代支付可随时间增长时,引导是可行的。我们通过实验补充理论正面结果,展示引导在大规模博弈中的有效性,并阐明我们的框架与最优机制设计和信息设计之间的关联。