Counterfactual Regret Minimization (CFR) and its variants are the best algorithms so far for solving large-scale incomplete information games. Building upon CFR, this paper proposes a new algorithm named Pure CFR (PCFR) for achieving better performance. PCFR can be seen as a combination of CFR and Fictitious Play (FP), inheriting the concept of counterfactual regret (value) from CFR, and using the best response strategy instead of the regret matching strategy for the next iteration. Our theoretical proof that PCFR can achieve Blackwell approachability enables PCFR's ability to combine with any CFR variant including Monte Carlo CFR (MCCFR). The resultant Pure MCCFR (PMCCFR) can significantly reduce time and space complexity. Particularly, the convergence speed of PMCCFR is at least three times more than that of MCCFR. In addition, since PMCCFR does not pass through the path of strictly dominated strategies, we developed a new warm-start algorithm inspired by the strictly dominated strategies elimination method. Consequently, the PMCCFR with new warm start algorithm can converge by two orders of magnitude faster than the CFR+ algorithm.
翻译:反事实遗憾最小化(CFR)及其变体是目前求解大规模不完全信息博弈的最佳算法。本文在CFR基础上提出了一种名为纯CFR(PCFR)的新算法,以实现更优性能。PCFR可视为CFR与虚拟博弈(FP)的结合:它继承了CFR中反事实遗憾(价值)的概念,但采用最佳响应策略替代遗憾匹配策略进行下一次迭代。我们理论证明了PCFR能实现布莱克威尔可逼近性,这使得PCFR能够与包括蒙特卡洛CFR(MCCFR)在内的任意CFR变体相结合。由此产生的纯MCCFR(PMCCFR)可显著降低时间和空间复杂度。特别地,PMCCFR的收敛速度至少比MCCFR快三倍。此外,由于PMCCFR不经过严格劣势策略路径,我们受严格劣势策略消除方法启发,开发了一种新的热启动算法。由此,采用新热启动算法的PMCCFR收敛速度可比CFR+算法快两个数量级。