In this paper, we investigate the power of {\it regularization}, a common technique in reinforcement learning and optimization, in solving extensive-form games (EFGs). We propose a series of new algorithms based on regularizing the payoff functions of the game, and establish a set of convergence results that strictly improve over the existing ones, with either weaker assumptions or stronger convergence guarantees. In particular, we first show that dilated optimistic mirror descent (DOMD), an efficient variant of OMD for solving EFGs, with adaptive regularization can achieve a fast $\tilde O(1/T)$ last-iterate convergence in terms of duality gap and distance to the set of Nash equilibrium (NE) without uniqueness assumption of the NE. Second, we show that regularized counterfactual regret minimization (\texttt{Reg-CFR}), with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(1/T^{1/4})$ best-iterate, and $O(1/T^{3/4})$ average-iterate convergence rate for finding NE in EFGs. Finally, we show that \texttt{Reg-CFR} can achieve asymptotic last-iterate convergence, and optimal $O(1/T)$ average-iterate convergence rate, for finding the NE of perturbed EFGs, which is useful for finding approximate extensive-form perfect equilibria (EFPE). To the best of our knowledge, they constitute the first last-iterate convergence results for CFR-type algorithms, while matching the state-of-the-art average-iterate convergence rate in finding NE for non-perturbed EFGs. We also provide numerical results to corroborate the advantages of our algorithms.
翻译:本文深入研究了正则化——一种强化学习与优化中的常用技术——在求解扩展式博弈(Extensive-Form Games, EFGs)中的威力。我们基于对博弈收益函数进行正则化,提出了一系列新算法,并建立了一系列收敛结果,这些结果在假设更弱或收敛保证更强的意义上严格优于现有结果。具体而言:首先,我们证明,采用自适应正则化的膨胀乐观镜像下降(Dilated Optimistic Mirror Descent, DOMD)算法——一种求解EFGs的高效OMD变体——能够在不对纳什均衡(Nash Equilibrium, NE)进行唯一性假设的情况下,在对偶间隙与到NE集合的距离上实现快速的$\tilde O(1/T)$最后迭代收敛。其次,我们证明,采用乐观镜像下降算法变体作为遗憾最小化器的正则化反事实遗憾最小化(\texttt{Reg-CFR}),在求解EFGs的NE时能实现$O(1/T^{1/4})$的最佳迭代收敛速度与$O(1/T^{3/4})$的平均迭代收敛速度。最后,我们证明,\texttt{Reg-CFR} 在求解扰动EFGs的NE时能实现渐近的最后迭代收敛与最优的$O(1/T)$平均迭代收敛速度,这对求解近似扩展式完美均衡(Extensive-Form Perfect Equilibria, EFPE)具有重要价值。据我们所知,这些结果是CFR类算法首次实现最后迭代收敛,同时在求解非扰动EFGs的NE时匹配了最先进的平均迭代收敛速度。我们还提供了数值实验以佐证我们算法的优势。