Contemporary applications of machine learning in two-team e-sports and the superior expressivity of multi-agent generative adversarial networks raise important and overlooked theoretical questions regarding optimization in two-team games. Formally, two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents, each experiencing a utility identical to that of their teammates and opposite to that of the opposing team. We focus on the solution concept of Nash equilibria (NE). We first show that computing NE for this class of games is $\textit{hard}$ for the complexity class ${\mathrm{CLS}}$. To further examine the capabilities of online learning algorithms in games with full-information feedback, we propose a benchmark of a simple -- yet nontrivial -- family of such games. These games do not enjoy the properties used to prove convergence for relevant algorithms. In particular, we use a dynamical systems perspective to demonstrate that gradient descent-ascent, its optimistic variant, optimistic multiplicative weights update, and extra gradient fail to converge (even locally) to a Nash equilibrium. On a brighter note, we propose a first-order method that leverages control theory techniques and under some conditions enjoys last-iterate local convergence to a Nash equilibrium. We also believe our proposed method is of independent interest for general min-max optimization.
翻译:机器学习在两人电竞中的当代应用以及多智能体生成对抗网络的卓越表现力,引发了关于两队博弈优化中重要但被忽视的理论问题。形式上,两队零和博弈被定义为分为两个竞争性智能体集合的多玩家博弈,其中每个智能体获得的效用与其队友相同,且与对方团队相反。我们聚焦于纳什均衡(NE)这一解概念。首先证明,对此类博弈计算NE对于复杂度类$\mathrm{CLS}$而言是$\textit{困难的}$。为进一步考察在线学习算法在完全信息反馈博弈中的能力,我们提出了一个简单但非平凡的家庭博弈基准。这些博弈不具备用于证明相关算法收敛性的性质。特别地,我们使用动力系统视角证明,梯度上升-下降、其乐观变体、乐观乘法权重更新以及额外梯度等方法无法(甚至在局部)收敛至纳什均衡。值得关注的是,我们提出了一种利用控制理论技术的一阶方法,在特定条件下可实现最后迭代局部收敛至纳什均衡。此外,我们相信所提出的方法对于一般最小-最大优化问题具有独立价值。