Two-team zero-sum games are one of the most important paradigms in game theory. In this paper, we focus on finding an unexploitable equilibrium in large team games. An unexploitable equilibrium is a worst-case policy, where members in the opponent team cannot increase their team reward by taking any policy, e.g., cooperatively changing to other joint policies. As an optimal unexploitable equilibrium in two-team zero-sum games, correlated-team maxmin equilibrium remains unexploitable even in the worst case where players in the opponent team can achieve arbitrary cooperation through a joint team policy. However, finding such an equilibrium in large games is challenging due to the impracticality of evaluating the exponentially large number of joint policies. To solve this problem, we first introduce a general solution concept called restricted correlated-team maxmin equilibrium, which solves the problem of being impossible to evaluate all joint policy by a sample factor while avoiding an exploitation problem under the incomplete joint policy evaluation. We then develop an efficient sequential correlation mechanism, and based on which we propose an algorithm for approximating the unexploitable equilibrium in large games. We show that our approach achieves lower exploitability than the state-of-the-art baseline when encountering opponent teams with different exploitation ability in large team games including Google Research Football.
翻译:两人团队零和博弈是博弈论中最重要的范式之一。本文聚焦于在大型团队博弈中寻找不可利用均衡。不可利用均衡是一种最坏情况策略,其中对手团队成员无法通过采取任何策略(例如合作性转向其他联合策略)来增加其团队收益。作为两人团队零和博弈中的最优不可利用均衡,相关团队最大最小均衡即使在最坏情况下(即对手团队玩家可通过联合团队策略实现任意合作)仍保持不可利用性。然而,评估指数级数量的联合策略在实践中不可行,导致在大型博弈中寻找此类均衡极具挑战性。为解决此问题,我们首先提出一种称为受限相关团队最大最小均衡的通用解概念,该概念通过采样因子解决无法评估所有联合策略的问题,同时避免不完全联合策略评估下的利用问题。随后,我们开发了一种高效的序列化相关性机制,并基于此提出一种逼近大型博弈中不可利用均衡的算法。实验表明,在包括Google Research Football在内的大型团队博弈中,当面对具有不同利用能力的对手团队时,我们的方法实现了比当前最先进基线更低的可利用率。