By incorporating regret minimization, double oracle methods have demonstrated rapid convergence to Nash Equilibrium (NE) in normal-form games and extensive-form games, through algorithms such as online double oracle (ODO) and extensive-form double oracle (XDO), respectively. In this study, we further examine the theoretical convergence rate and sample complexity of such regret minimization-based double oracle methods, utilizing a unified framework called Regret-Minimizing Double Oracle. Based on this framework, we extend ODO to extensive-form games and determine its sample complexity. Moreover, we demonstrate that the sample complexity of XDO can be exponential in the number of information sets $|S|$, owing to the exponentially decaying stopping threshold of restricted games. To solve this problem, we propose the Periodic Double Oracle (PDO) method, which has the lowest sample complexity among all existing double oracle methods, being only polynomial in $|S|$. Empirical evaluations on multiple poker and board games show that PDO achieves significantly faster convergence than previous double oracle algorithms and reaches a competitive level with state-of-the-art regret minimization methods.
翻译:通过引入遗憾最小化机制,双人博弈方法在正则形式博弈和扩展形式博弈中已展现出快速收敛至纳什均衡(NE)的能力,代表性算法包括在线双人博弈(ODO)和扩展形式双人博弈(XDO)。本研究利用统一的"基于遗憾最小化的双人博弈"框架,进一步分析此类方法的理论收敛速率与样本复杂度。基于该框架,我们将ODO扩展至扩展形式博弈并确定其样本复杂度。此外,我们证明XDO的样本复杂度关于信息集数量$|S|$可能呈指数级增长,这源于受限博弈指数衰减的停止阈值。为解决此问题,我们提出周期双人博弈(PDO)方法,其样本复杂度在所有现有双人博弈方法中最低,仅关于$|S|$呈多项式级增长。在多种扑克与棋盘游戏上的实证评估表明,PDO相比先前双人博弈算法收敛速度显著提升,并达到与最先进遗憾最小化方法相竞争的水平。