A mediator observes no-regret learners playing an extensive-form game repeatedly across $T$ rounds. The mediator attempts to steer players toward some desirable predetermined equilibrium by giving (nonnegative) payments to players. We call this the steering problem. The steering problem captures problems several problems of interest, among them equilibrium selection and information design (persuasion). If the mediator's budget is unbounded, steering is trivial because the mediator can simply pay the players to play desirable actions. We study two bounds on the mediator's payments: a total budget and a per-round budget. If the mediator's total budget does not grow with $T$, we show that steering is impossible. However, we show that it is enough for the total budget to grow sublinearly with $T$, that is, for the average payment to vanish. When players' full strategies are observed at each round, we show that constant per-round budgets permit steering. In the more challenging setting where only trajectories through the game tree are observable, we show that steering is impossible with constant per-round budgets in general extensive-form games, but possible in normal-form games or if the per-round budget may itself depend on $T$. We also show how our results can be generalized to the case when the equilibrium is being computed online while steering is happening. We supplement our theoretical positive results with experiments highlighting the efficacy of steering in large games.
翻译:调解者观察无悔学习者在$T$轮重复博弈中重复进行扩展式博弈。调解者试图通过向玩家提供(非负)支付来引导玩家达到某个预先设定的理想均衡。我们称此问题为引导问题。引导问题涵盖多个重要问题,包括均衡选择与信息设计(劝说)。若调解者预算无上限,引导是平凡的,因为调解者可直接支付玩家以执行理想行动。我们研究调解者支付的两种预算约束:总预算与每轮预算。若调解者的总预算不随$T$增长,我们证明引导不可实现。然而,我们证明总预算仅需随$T$次线性增长(即平均支付趋近于零)即可实现引导。当每轮能观测到玩家的完整策略时,我们证明恒定每轮预算允许实现引导。在更具挑战性的仅能观测博弈树轨迹的场景中,我们证明在一般扩展式博弈中恒定每轮预算无法实现引导,但在标准式博弈中或当每轮预算本身可依赖$T$时则可能实现。我们还展示了如何将结果推广至均衡在线计算与引导同步进行的场景。我们通过实验补充理论正向结果,验证了引导在大型博弈中的有效性。