Subgame solving is a technique for scaling algorithms to large games by locally refining a precomputed blueprint strategy during gameplay. While straightforward in perfect-information games where search starts from the current state, subgame solving in imperfect-information games must account for hidden states and uncertainty about the opponent's past strategy. Gadget games were developed to ensure that the improved subgame strategy is robust against any possible opponent's strategy in a zero-sum game. Gadget games typically contain infinitely many Nash equilibria. We demonstrate that while these equilibria are equivalent in the gadget game, they yield vastly different performance in the full game, even when facing a rational opponent. We propose gadget game sequential equilibria as the preferred solution concept. We introduce modifications to the sequence-form linear program and counterfactual regret minimization that converge to these refined solutions with only mild additional computational cost. Additionally, we provide several new insights into the surprising superiority of the resolving gadget game over the max-margin gadget game. Our experiments compare different Nash equilibria of gadget games in several standard benchmark games, showing that our refined equilibria consistently outperform unrefined Nash equilibria, and can reduce the exploitability of the overall strategy by more than 50%
翻译:子博弈求解是一种通过游戏过程中局部优化预计算的蓝图策略,将算法扩展至大规模博弈的技术。在完全信息博弈中,搜索从当前状态开始,该技术较为直接;而在非完全信息博弈中,子博弈求解必须考虑隐藏状态以及对手历史策略的不确定性。为在零和博弈中确保改进后的子博弈策略能够稳健应对对手任何可能的策略,研究者开发了辅助博弈。辅助博弈通常包含无限多个纳什均衡。我们证明,尽管这些均衡在辅助博弈中是等价的,但在完整博弈中,即使面对理性对手,它们也会产生截然不同的性能表现。我们提出将辅助博弈序贯均衡作为首选解概念。我们对序列形式线性规划及反事实遗憾最小化方法进行了改进,使其能以仅轻微增加的计算成本收敛至这些精炼解。此外,我们针对解析辅助博弈相对于最大边际辅助博弈表现出的显著优越性提供了若干新见解。实验在多个标准基准博弈中比较了辅助博弈的不同纳什均衡,结果表明我们的精炼均衡始终优于未精炼的纳什均衡,并能将整体策略的可利用性降低超过50%。