By incorporating regret minimization, double oracle methods have demonstrated rapid convergence to Nash Equilibrium (NE) in normal-form games and extensive-form games, through algorithms such as online double oracle (ODO) and extensive-form double oracle (XDO), respectively. In this study, we further examine the theoretical convergence rate and sample complexity of such regret minimization-based double oracle methods, utilizing a unified framework called Regret-Minimizing Double Oracle. Based on this framework, we extend ODO to extensive-form games and determine its sample complexity. Moreover, we demonstrate that the sample complexity of XDO can be exponential in the number of information sets $|S|$, owing to the exponentially decaying stopping threshold of restricted games. To solve this problem, we propose the Periodic Double Oracle (PDO) method, which has the lowest sample complexity among regret minimization-based double oracle methods, being only polynomial in $|S|$. Empirical evaluations on multiple poker and board games show that PDO achieves significantly faster convergence than previous double oracle algorithms and reaches a competitive level with state-of-the-art regret minimization methods.
翻译:通过引入遗憾最小化,基于在线双Oracle(ODO)和扩展形式双Oracle(XDO)等算法,双Oracle方法已在正规形式博弈和扩展形式博弈中展现出快速收敛至纳什均衡(NE)的特性。本研究利用名为“遗憾最小化双Oracle”的统一框架,进一步分析此类基于遗憾最小化的双Oracle方法的理论收敛速率与样本复杂度。基于该框架,我们将ODO方法扩展至扩展形式博弈并确定其样本复杂度。此外,我们证明XDO方法的样本复杂度可能随信息集数量$|S|$呈指数级增长,其原因在于受限博弈的停止阈值呈指数级衰减。为解决此问题,我们提出周期性双Oracle(PDO)方法,该方法在基于遗憾最小化的双Oracle方法中具有最低样本复杂度(仅关于$|S|$呈多项式级)。在多项扑克与棋盘游戏上的实证评估表明,PDO方法较以往双Oracle算法实现了显著更快的收敛速度,并达到与最先进遗憾最小化方法相媲美的性能水平。