Perfect Bayesian Equilibrium (PBE) is a refinement of the Nash equilibrium for imperfect-information extensive-form games (EFGs) that enforces consistency between the two components of a solution: agents' strategy profile describing their decisions at information sets and the belief system quantifying their uncertainty over histories within an information set. We present a scalable approach for computing a PBE of an arbitrary two-player EFG. We adopt the definition of PBE enunciated by Bonanno in 2011 using a consistency concept based on the theory of belief revision due to Alchourrón, Gärdenfors, and Makinson. Our algorithm for finding a PBE is an adaptation of Counterfactual Regret Minimization (CFR) that minimizes the expected regret at each information set given a belief system, while maintaining the necessary consistency criteria. We prove that our algorithm is correct for two-player zero-sum games and has a reasonable slowdown in time-complexity relative to classical CFR given the additional computation needed for refinement. We also experimentally demonstrate the competent performance of PBE-CFR in terms of equilibrium quality and running time on medium-to-large non-zero-sum EFGs. Finally, we investigate the effectiveness of using PBE for strategy exploration in empirical game-theoretic analysis. Specifically, we compute PBE as a meta-strategy solver (MSS) in a tree-exploiting variant of Policy Space Response Oracles (TE-PSRO). Our experiments show that PBE as an MSS leads to higher-quality empirical EFG models with complex imperfect information structures compared to MSSs based on an unrefined Nash equilibrium.
翻译:完美贝叶斯均衡(PBE)是纳什均衡在不完美信息扩展式博弈(EFG)中的一种精炼,它强化了解的两个组成部分之间的一致性:描述智能体在信息集处决策的策略组合,以及量化其在信息集内对历史不确定性的信念系统。本文提出了一种可扩展的方法,用于计算任意双人扩展式博弈的完美贝叶斯均衡。我们采用 Bonanno 于 2011 年阐述的 PBE 定义,其一致性概念基于 Alchourrón、Gärdenfors 和 Makinson 的信念修正理论。我们寻找 PBE 的算法是对反事实遗憾最小化(CFR)的一种改进,它在给定信念系统的前提下最小化每个信息集的期望遗憾,同时维持必要的一致性标准。我们证明了该算法在双人零和博弈中的正确性,并且考虑到精炼所需的额外计算,其时间复杂度相对于经典 CFR 具有合理的减缓。我们还通过实验证明了 PBE-CFR 在中大型非零和扩展式博弈上,在均衡质量和运行时间方面均表现良好。最后,我们研究了在实证博弈论分析中使用 PBE 进行策略探索的有效性。具体而言,我们计算 PBE 作为策略空间响应预言机(PSRO)的树利用变体(TE-PSRO)中的元策略求解器(MSS)。实验表明,与基于未精炼纳什均衡的 MSS 相比,将 PBE 作为 MSS 能够为具有复杂不完美信息结构的实证扩展式博弈模型带来更高质量的建模结果。