Counterfactual Regret Minimization (CFR) and its variants are the best algorithms so far for solving large-scale incomplete information games. However, we believe that there are two problems with CFR: First, matrix multiplication is required in CFR iteration, and the time complexity of one iteration is too high; Secondly, the game characteristics in the real world are different. Just using one CFR algorithm will not be perfectly suitable for all game problems. For these two problems, this paper proposes a new algorithm called Pure CFR (PCFR) based on CFR. PCFR can be seen as a combination of CFR and Fictitious Play (FP), inheriting the concept of counterfactual regret (value) from CFR, and using the best response strategy instead of the regret matching strategy for the next iteration. This algorithm has three advantages. First, PCFR can be combined with any CFR variant. The resulting Pure MCCFR (PMCCFR) can significantly reduce the time and space complexity of one iteration. Secondly, our experiments show that the convergence speed of the PMCCFR is 2$\sim$3 times that of the MCCFR. Finally, there is a type of game that is very suitable for PCFR. We call this type of game clear-game, which is characterized by a high proportion of dominated strategies. Experiments show that in clear-game, the convergence rate of PMCCFR is two orders of magnitude higher than that of MCCFR.
翻译:反事实遗憾最小化(CFR)及其变体是目前解决大规模不完全信息博弈的最佳算法。然而,我们认为CFR存在两个问题:首先,CFR迭代过程中需要矩阵乘法,单次迭代的时间复杂度过高;其次,现实世界中的博弈特性各不相同,仅使用一种CFR算法无法完美适用于所有博弈问题。针对这两个问题,本文基于CFR提出了一种新算法——纯CFR(PCFR)。PCFR可视为CFR与虚拟博弈(FP)的结合,既继承了CFR的反事实遗憾(价值)概念,又采用最优响应策略替代遗憾匹配策略进行下一次迭代。该算法具有三大优势:第一,PCFR可与任意CFR变体相结合,由此产生的纯MCCFR(PMCCFR)能显著降低单次迭代的时间与空间复杂度;第二,实验表明PMCCFR的收敛速度可达MCCFR的2~3倍;第三,存在一类特别适合PCFR的博弈类型,我们称之为"清晰博弈",其特点是占优策略占比极高。实验证明,在清晰博弈中,PMCCFR的收敛速度比MCCFR高出两个数量级。