We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve $O(T^{-1/2})$ convergence rate to a correlated equilibrium and an accelerated $O(T^{-3/4})$ convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium. Our algorithm is constructed by combining two main elements (i) smooth value updates and (ii) the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer.
翻译:我们研究用于计算多人一般和马尔可夫博弈中相关均衡的策略优化算法。已有成果在相关均衡上达到 $O(T^{-1/2})$ 的收敛速度,而在较弱的粗相关均衡上实现加速的 $O(T^{-3/4})$ 收敛速度。本文通过提出一种非耦合策略优化算法,在计算相关均衡时实现了接近最优的 $\tilde{O}(T^{-1})$ 收敛速度,显著改进了上述两项结果。我们的算法通过结合两个核心要素构建:(i)平滑值更新,以及(ii)采用对数障碍正则化项的乐观跟随正则化领导者算法。