Counterfactual regret minimization (CFR) algorithms are a foundational class of methods for solving imperfect-information games, with the time average of their iterates converging to a Nash equilibrium in two-player zero-sum games. Prior state-of-the-art variants, Discounted CFR (DCFR) and Predictive CFR$^+$ (PCFR$^+$), achieved the fastest known practical performance by improving convergence rates over vanilla CFR through discounting early iterations with a fixed discounting scheme. More recently, Dynamic DCFR (DDCFR) introduced agent-learned dynamic discounting schemes to further accelerate convergence, at the cost of increased complexity. To address this, we propose Hyperparameter Schedules (HSs), a remarkably simple, training-free framework that dynamically adjusts CFR discounting over time. HSs aggressively downweight early updates and gradually transition to trusting late-stage strategies, leading to substantially faster convergence with only a few lines of code modifications. We show that HSs derived from just three small extensive-form games generalize effectively to 17 diverse games (including large-scale realistic poker) in both extensive-form and normal-form settings, without any game-specific tuning. Our method establishes a new state of the art for solving two-player zero-sum games.
翻译:反事实遗憾最小化(CFR)算法是求解非完美信息博弈的基础方法类别,其迭代序列的时间平均在两人零和博弈中收敛至纳什均衡。先前的最优变体——折扣CFR(DCFR)与预测性CFR$^+$(PCFR$^+$)——通过采用固定折扣方案对早期迭代进行衰减,提升了相对于原始CFR的收敛速度,实现了已知最快的实际性能。最近提出的动态DCFR(DDCFR)引入了智能体学习的动态折扣方案以进一步加速收敛,但代价是增加了算法复杂度。为解决此问题,我们提出超参数调度(HSs)——一个极其简洁、无需训练的框架,可动态调整CFR的折扣策略。HSs通过大幅降低早期更新的权重,并逐步过渡至信任后期策略,仅需数行代码修改即可实现显著加速的收敛速度。我们证明,仅从三个小型扩展式博弈推导出的HSs方案,可有效泛化至扩展式与标准式设定下的17种不同博弈(包括大规模真实扑克博弈),且无需任何博弈特定调参。本方法为求解两人零和博弈确立了新的性能标杆。