Perfect Bayesian Equilibrium (PBE) is a refinement of the Nash equilibrium for imperfect-information extensive-form games (EFGs) that enforces consistency between the two components of a solution: agents' strategy profile describing their decisions at information sets and the belief system quantifying their uncertainty over histories within an information set. We present a scalable approach for computing a PBE of an arbitrary two-player EFG. We adopt the definition of PBE enunciated by Bonanno in 2011 using a consistency concept based on the theory of belief revision due to Alchourrón, Gärdenfors, and Makinson. Our algorithm for finding a PBE is an adaptation of Counterfactual Regret Minimization (CFR) that minimizes the expected regret at each information set given a belief system, while maintaining the necessary consistency criteria. We prove that our algorithm is correct for two-player zero-sum games and has a reasonable slowdown in time-complexity relative to classical CFR given the additional computation needed for refinement. We also experimentally demonstrate the competent performance of PBE-CFR in terms of equilibrium quality and running time on medium-to-large non-zero-sum EFGs. Finally, we investigate the effectiveness of using PBE for strategy exploration in empirical game-theoretic analysis. Specifically, we compute PBE as a meta-strategy solver (MSS) in a tree-exploiting variant of Policy Space Response Oracles (TE-PSRO). Our experiments show that PBE as an MSS leads to higher-quality empirical EFG models with complex imperfect information structures compared to MSSs based on an unrefined Nash equilibrium.
翻译:完美贝叶斯均衡(PBE)是不完全信息扩展式博弈(EFG)中纳什均衡的一种精炼形式,它强化了博弈解的两个组成部分之间的一致性:描述智能体在信息集上决策的策略组合,以及量化其在信息集内对历史不确定性的信念系统。本文提出了一种可扩展的方法,用于计算任意双人扩展式博弈的完美贝叶斯均衡。我们采用博南诺于2011年阐述的PBE定义,该定义基于阿尔丘龙、加登福斯和马克林森的信念修正理论构建了一致性概念。我们提出的PBE求解算法是对反事实遗憾最小化(CFR)方法的改进,该算法在给定信念系统的条件下最小化每个信息集的期望遗憾,同时保持必要的一致性准则。我们证明该算法在双人零和博弈中具有正确性,并且考虑到精炼所需的额外计算,其时间复杂度相对于经典CFR仅存在合理程度的增长。我们还通过实验证明了PBE-CFR在中等至大规模非零和扩展式博弈中,在均衡质量和运行时间方面均表现出良好性能。最后,我们探究了将PBE用于实证博弈论分析中策略探索的有效性。具体而言,我们将PBE作为元策略求解器(MSS)应用于策略空间响应预言机(PSRO)的树利用变体(TE-PSRO)中。实验表明,与基于未精炼纳什均衡的元策略求解器相比,采用PBE作为元策略求解器能够为具有复杂不完全信息结构的扩展式博弈构建更高质量的实证模型。