In this paper, we introduce the first algorithmic framework for Blackwell approachability on the sequence-form polytope, the class of convex polytopes capturing the strategies of players in extensive-form games (EFGs). This leads to a new class of regret-minimization algorithms that are stepsize-invariant, in the same sense as the Regret Matching and Regret Matching$^+$ algorithms for the simplex. Our modular framework can be combined with any existing regret minimizer over cones to compute a Nash equilibrium in two-player zero-sum EFGs with perfect recall, through the self-play framework. Leveraging predictive online mirror descent, we introduce Predictive Treeplex Blackwell$^+$ (PTB$^+$), and show a $O(1/\sqrt{T})$ convergence rate to Nash equilibrium in self-play. We then show how to stabilize PTB$^+$ with a stepsize, resulting in an algorithm with a state-of-the-art $O(1/T)$ convergence rate. We provide an extensive set of experiments to compare our framework with several algorithmic benchmarks, including CFR$^+$ and its predictive variant, and we highlight interesting connections between practical performance and the stepsize-dependence or stepsize-invariance properties of classical algorithms.
翻译:本文首次提出了序列形式多面体上的Blackwell可逼近性算法框架,该类凸多面体刻画了扩展型博弈(EFGs)中玩家的策略。这催生了一类新的步长不变型遗憾最小化算法,其性质类似于单纯形上的Regret Matching与Regret Matching$^+$算法。我们的模块化框架可与任意现有锥上遗憾最小化器结合,通过自博弈框架计算具有完美回忆的双人零和扩展型博弈的纳什均衡。利用预测型在线镜像下降法,我们提出预测型Treeplex Blackwell$^+$(PTB$^+$)算法,并证明其在自博弈中具有$O(1/\sqrt{T})$的纳什均衡收敛速率。进一步地,我们展示了如何通过步长稳定PTB$^+$,从而得到具有当前最优$O(1/T)$收敛速率的算法。我们通过大量实验将本框架与包括CFR$^+$及其预测变体在内的多种算法基准进行对比,并揭示了实际性能与经典算法中步长依赖性/步长不变性之间的有趣关联。