In their seminal work, Nayyar et al. (2013) showed that imperfect information can be abstracted away from common-payoff games by having players publicly announce their policies as they play. This insight underpins sound solvers and decision-time planning algorithms for common-payoff games. Unfortunately, a naive application of the same insight to two-player zero-sum games fails because Nash equilibria of the game with public policy announcements may not correspond to Nash equilibria of the original game. As a consequence, existing sound decision-time planning algorithms require complicated additional mechanisms that have unappealing properties. The main contribution of this work is showing that certain regularized equilibria do not possess the aforementioned non-correspondence problem -- thus, computing them can be treated as perfect-information problems. Because these regularized equilibria can be made arbitrarily close to Nash equilibria, our result opens the door to a new perspective to solving two-player zero-sum games and yields a simplified framework for decision-time planning in two-player zero-sum games, void of the unappealing properties that plague existing decision-time planning approaches.
翻译:在其开创性工作中,Nayyar 等人(2013)指出,通过让玩家在游戏过程中公开宣布其策略,可以将不完美信息从共同收益博弈中抽象掉。这一洞见支撑了共同收益博弈的可靠求解器与决策时规划算法。然而,将该洞见直接应用于两人零和博弈会失败,因为公开策略宣布后的博弈中的纳什均衡可能与原博弈的纳什均衡不对应。因此,现有可靠的决策时规划算法需要复杂的附加机制,这些机制存在不尽如人意的性质。本文的主要贡献在于证明了某些正则化均衡不存在上述不对应问题——因此,计算这些均衡可被视为完美信息问题。由于这些正则化均衡可以任意逼近纳什均衡,我们的结果为求解两人零和博弈开辟了新的视角,并为两人零和博弈中的决策时规划提供了一个简化框架,避免了困扰现有决策时规划方法的那些不尽如人意的性质。