Extensive-Form Game (EFG) represents a fundamental model for analyzing sequential interactions among multiple agents and the primary challenge to solve it lies in mitigating sample complexity. Existing research indicated that Double Oracle (DO) can reduce the sample complexity dependence on the information set number $|S|$ to the final restricted game size $X$ in solving EFG. This is attributed to the early convergence of full-game Nash Equilibrium (NE) through iteratively solving restricted games. However, we prove that the state-of-the-art Extensive-Form Double Oracle (XDO) exhibits \textit{exponential} sample complexity of $X$, due to its exponentially increasing restricted game expansion frequency. Here we introduce Adaptive Double Oracle (AdaDO) to significantly alleviate sample complexity to \textit{polynomial} by deploying the optimal expansion frequency. Furthermore, to comprehensively study the principles and influencing factors underlying sample complexity, we introduce a novel theoretical framework Regret-Minimizing Double Oracle (RMDO) to provide directions for designing efficient DO algorithms. Empirical results demonstrate that AdaDO attains the more superior approximation of NE with less sample complexity than the strong baselines including Linear CFR, MCCFR and existing DO. Importantly, combining RMDO with warm starting and stochastic regret minimization further improves convergence rate and scalability, thereby paving the way for addressing complex multi-agent tasks.
翻译:扩展式博弈(EFG)是分析多智能体序贯交互的基础模型,其求解的主要挑战在于降低样本复杂度。现有研究表明,双甲骨(DO)方法在求解EFG时,可将样本复杂度对信息集数量$|S|$的依赖降低至最终受限博弈规模$X$。这归因于通过迭代求解受限博弈实现全博弈纳什均衡(NE)的早期收敛。然而,我们证明当前最先进的扩展式双甲骨(XDO)方法由于受限博弈扩展频率呈指数级增长,其样本复杂度对$X$具有指数依赖。为此,我们提出自适应双甲骨(AdaDO)方法,通过部署最优扩展频率,将样本复杂度显著降低至多项式级别。进一步地,为系统研究样本复杂度的内在原理与影响因素,我们提出一种新型理论框架——遗憾最小化双甲骨(RMDO),为设计高效DO算法提供理论指导。实验结果表明,相较于Linear CFR、MCCFR及现有DO等强基线方法,AdaDO能以更低的样本复杂度获得更优的NE近似解。值得注意的是,将RMDO与热启动及随机遗憾最小化技术结合,可进一步提升收敛速度与可扩展性,从而为处理复杂多智能体任务开辟新路径。