In the literature on game-theoretic equilibrium finding, focus has mainly been on solving a single game in isolation. In practice, however, strategic interactions -- ranging from routing problems to online advertising auctions -- evolve dynamically, thereby leading to many similar games to be solved. To address this gap, we introduce meta-learning for equilibrium finding and learning to play games. We establish the first meta-learning guarantees for a variety of fundamental and well-studied classes of games, including two-player zero-sum games, general-sum games, and Stackelberg games. In particular, we obtain rates of convergence to different game-theoretic equilibria that depend on natural notions of similarity between the sequence of games encountered, while at the same time recovering the known single-game guarantees when the sequence of games is arbitrary. Along the way, we prove a number of new results in the single-game regime through a simple and unified framework, which may be of independent interest. Finally, we evaluate our meta-learning algorithms on endgames faced by the poker agent Libratus against top human professionals. The experiments show that games with varying stack sizes can be solved significantly faster using our meta-learning techniques than by solving them separately, often by an order of magnitude.
翻译:在博弈论均衡求解的文献中,研究重点主要集中于孤立求解单一博弈。然而在实际应用中,从路径规划问题到在线广告拍卖等战略互动场景均呈现动态演化特征,由此衍生出大量需同时求解的相似博弈。为填补这一空白,我们首次将元学习引入博弈均衡求解与博弈学习领域,针对包括两人零和博弈、一般和博弈及斯塔克尔伯格博弈在内的多种基础且被广泛研究的博弈类别,建立了首批元学习理论保证。我们获得了收敛至不同博弈均衡的收敛速率,该速率取决于所遇博弈序列间的自然相似性度量,同时当博弈序列任意时仍能恢复已知的单博弈理论保证。在研究过程中,我们通过统一简洁的框架证明了若干单博弈场景的新结论,这些结论本身可能具有独立研究价值。最后,我们以扑克AI系统Libratus对战顶尖人类职业选手时的残局场景评估了所提出的元学习算法。实验表明,相较于单独求解各博弈,采用元学习技术可显著加速求解具有不同筹码规模的博弈,加速幅度通常可达一个数量级。