We introduce a new approach for computing optimal equilibria via learning in games. It applies to extensive-form settings with any number of players, including mechanism design, information design, and solution concepts such as correlated, communication, and certification equilibria. We observe that optimal equilibria are minimax equilibrium strategies of a player in an extensive-form zero-sum game. This reformulation allows to apply techniques for learning in zero-sum games, yielding the first learning dynamics that converge to optimal equilibria, not only in empirical averages, but also in iterates. We demonstrate the practical scalability and flexibility of our approach by attaining state-of-the-art performance in benchmark tabular games, and by computing an optimal mechanism for a sequential auction design problem using deep reinforcement learning.
翻译:我们提出了一种通过博弈学习计算最优均衡的新方法。该方法适用于任意玩家数量的扩展式博弈环境,涵盖机制设计、信息设计以及相关均衡、通信均衡和认证均衡等解概念。我们发现最优均衡实质上是玩家在扩展式零和博弈中的极小化极大均衡策略。这一重构使得可以应用零和博弈中的学习技术,首次得到不仅经验平均收敛、而且迭代点收敛到最优均衡的学习动力学过程。通过在基准表格博弈中取得最先进的性能表现,以及利用深度强化学习为序贯拍卖设计问题计算出最优机制,我们展示了该方法在实际中的可扩展性与灵活性。