In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the trigger regret of each player grows as $O(\log T)$ after $T$ repetitions of play. This improves exponentially over the prior best known trigger-regret bound of $O(T^{1/4})$, and settles a recent open question by Bai et al. (2022). As an immediate consequence, we guarantee convergence to the set of extensive-form correlated equilibria and coarse correlated equilibria at a near-optimal rate of $\frac{\log T}{T}$. Building on prior work, at the heart of our construction lies a more general result regarding fixed points deriving from rational functions with polynomial degree, a property that we establish for the fixed points of (coarse) trigger deviation functions. Moreover, our construction leverages a refined regret circuit for the convex hull, which -- unlike prior guarantees -- preserves the RVU property introduced by Syrgkanis et al. (NIPS, 2015); this observation has an independent interest in establishing near-optimal regret under learning dynamics based on a CFR-type decomposition of the regret.
翻译:本文中,我们建立了高效且非耦合的学习动力学机制,使得在多玩家完美回忆不完美信息扩展式博弈中,当所有玩家采用该机制时,每位玩家的触发后悔在重复博弈T次后以$O(\log T)$的速率增长。这相较于先前已知的最佳触发后悔界$O(T^{1/4})$实现了指数级改进,并解决了Bai等人(2022)近期提出的公开问题。作为直接推论,我们确保了以近乎最优速率$\frac{\log T}{T}$收敛到扩展式相关均衡与粗相关均衡集合。基于先前工作,我们构造的核心在于关于有理函数(具有多项式次数)导数的不动点的一个更普遍结论——我们证明了(粗)触发偏离函数的不动点具有该性质。此外,我们的构造利用了凸包的精细化后悔电路,该电路与先前保证不同,保持了Syrgkanis等人(NIPS, 2015)提出的RVU性质;这一发现对于建立基于CFR型后悔分解的学习动力学下近乎最优后悔具有独立意义。