Counterfactual Regret Minimization (CFR) and its variants are widely recognized as effective algorithms for solving extensive-form imperfect information games. Recently, many improvements have been focused on enhancing the convergence speed of the CFR algorithm. However, most of these variants are not applicable under Monte Carlo (MC) conditions, making them unsuitable for training in large-scale games. We introduce a new MC-based algorithm for solving extensive-form imperfect information games, called MCCFVFP (Monte Carlo Counterfactual Value-Based Fictitious Play). MCCFVFP combines CFR's counterfactual value calculations with fictitious play's best response strategy, leveraging the strengths of fictitious play to gain significant advantages in games with a high proportion of dominated strategies. Experimental results show that MCCFVFP achieved convergence speeds approximately 20\%$\sim$50\% faster than the most advanced MCCFR variants in games like poker and other test games.
翻译:反事实遗憾最小化(CFR)及其变体被广泛认为是求解扩展式不完全信息博弈的有效算法。近年来,许多改进工作集中于提升CFR算法的收敛速度。然而,这些变体大多不适用于蒙特卡洛(MC)条件,使其难以应用于大规模博弈的训练。我们提出了一种新的基于蒙特卡洛的算法,用于求解扩展式不完全信息博弈,称为MCCFVFP(蒙特卡洛反事实价值虚拟博弈)。MCCFVFP将CFR的反事实价值计算与虚拟博弈的最优反应策略相结合,利用虚拟博弈的优势,在具有高比例被支配策略的博弈中获得显著收益。实验结果表明,在扑克及其他测试博弈中,MCCFVFP的收敛速度比最先进的MCCFR变体快约20\%$\sim$50\%。