Counterfactual regret minimization (CFR) is a family of algorithms for effectively solving imperfect-information games. It decomposes the total regret into counterfactual regrets, utilizing local regret minimization algorithms, such as Regret Matching (RM) or RM+, to minimize them. Recent research establishes a connection between Online Mirror Descent (OMD) and RM+, paving the way for an optimistic variant PRM+ and its extension PCFR+. However, PCFR+ assigns uniform weights for each iteration when determining regrets, leading to substantial regrets when facing dominated actions. This work explores minimizing weighted counterfactual regret with optimistic OMD, resulting in a novel CFR variant PDCFR+. It integrates PCFR+ and Discounted CFR (DCFR) in a principled manner, swiftly mitigating negative effects of dominated actions and consistently leveraging predictions to accelerate convergence. Theoretical analyses prove that PDCFR+ converges to a Nash equilibrium, particularly under distinct weighting schemes for regrets and average strategies. Experimental results demonstrate PDCFR+'s fast convergence in common imperfect-information games. The code is available at https://github.com/rpSebastian/PDCFRPlus.
翻译:摘要:反事实遗憾最小化(CFR)是一类有效求解不完全信息博弈的算法。它将总遗憾分解为反事实遗憾,并利用局部遗憾最小化算法(如遗憾匹配(RM)或RM+)来最小化这些遗憾。近期研究建立了在线镜像下降(OMD)与RM+之间的联系,为乐观变体PRM+及其扩展PCFR+铺平了道路。然而,PCFR+在确定遗憾时对每次迭代赋予均匀权重,导致面对占优动作时产生较大遗憾。本文探索了利用乐观OMD最小化加权反事实遗憾,由此衍生出新的CFR变体PDCFR+。它以严谨方式整合了PCFR+与折扣CFR(DCFR),能快速缓解占优动作的负面影响,并持续利用预测加速收敛。理论分析证明,PDCFR+收敛到纳什均衡,特别是在对遗憾和平均策略采用不同加权方案的情况下。实验结果表明,PDCFR+在常见不完全信息博弈中具有快速收敛性。代码见https://github.com/rpSebastian/PDCFRPlus。