Coordinate descent methods are popular in machine learning and optimization for their simple sparse updates and excellent practical performance. In the context of large-scale sequential game solving, these same properties would be attractive, but until now no such methods were known, because the strategy spaces do not satisfy the typical separable block structure exploited by such methods. We present the first cyclic coordinate-descent-like method for the polytope of sequence-form strategies, which form the strategy spaces for the players in an extensive-form game (EFG). Our method exploits the recursive structure of the proximal update induced by what are known as dilated regularizers, in order to allow for a pseudo block-wise update. We show that our method enjoys a $O(1/T)$ convergence rate to a two-player zero-sum Nash equilibrium, while avoiding the worst-case polynomial scaling with the number of blocks common to cyclic methods. We empirically show that our algorithm usually performs better than other state-of-the-art first-order methods (i.e., mirror prox), and occasionally can even beat CFR$^+$, a state-of-the-art algorithm for numerical equilibrium computation in zero-sum EFGs. We then introduce a restarting heuristic for EFG solving. We show empirically that restarting can lead to speedups, sometimes huge, both for our cyclic method, as well as for existing methods such as mirror prox and predictive CFR$^+$.
翻译:坐标下降法因其简单的稀疏更新和出色的实际性能而在机器学习和优化中广受欢迎。在大规模序贯博弈求解的背景下,这些相同特性本应具有吸引力,但此前尚无此类方法被提出,原因在于策略空间并不满足此类方法所利用的典型可分块结构。我们首次提出了一种针对序列形式策略多面体的循环坐标下降类方法,此类策略构成了扩展式博弈(EFG)中玩家的策略空间。我们的方法利用了由所谓膨胀正则化子引发的近端更新的递归结构,从而实现伪块式更新。我们证明该方法以$O(1/T)$的收敛速率达到两人零和纳什均衡,同时避免了循环方法常见的块数量导致的恶劣多项式缩放。实验表明,我们的算法通常优于其他最先进的一阶方法(如镜像代理),有时甚至能超越CFR$^+$——一种用于零和EFG数值均衡计算的最先进算法。随后我们引入了一种用于EFG求解的重启启发式方法。实验证明,重启可带来加速(有时效果显著),既适用于我们的循环方法,也适用于镜像代理和预测性CFR$^+$等现有方法。