Nash Equilibrium (NE) is the canonical solution concept of game theory, which provides an elegant tool to understand the rationalities. Though mixed strategy NE exists in any game with finite players and actions, computing NE in two- or multi-player general-sum games is PPAD-Complete. Various alternative solutions, e.g., Correlated Equilibrium (CE), and learning methods, e.g., fictitious play (FP), are proposed to approximate NE. For convenience, we call these methods as "inexact solvers", or "solvers" for short. However, the alternative solutions differ from NE and the learning methods generally fail to converge to NE. Therefore, in this work, we propose REinforcement Nash Equilibrium Solver (RENES), which trains a single policy to modify the games with different sizes and applies the solvers on the modified games where the obtained solution is evaluated on the original games. Specifically, our contributions are threefold. i) We represent the games as $\alpha$-rank response graphs and leverage graph neural network (GNN) to handle the games with different sizes as inputs; ii) We use tensor decomposition, e.g., canonical polyadic (CP), to make the dimension of modifying actions fixed for games with different sizes; iii) We train the modifying strategy for games with the widely-used proximal policy optimization (PPO) and apply the solvers to solve the modified games, where the obtained solution is evaluated on original games. Extensive experiments on large-scale normal-form games show that our method can further improve the approximation of NE of different solvers, i.e., $\alpha$-rank, CE, FP and PRD, and can be generalized to unseen games.
翻译:纳什均衡(NE)是博弈论的经典解概念,为理解理性行为提供了优雅工具。尽管任何包含有限参与者和行动的博弈都存在混合策略纳什均衡,但双人或多人一般和博弈中的纳什均衡计算属于PPAD-完全问题。现有近似纳什均衡的替代方案(如相关均衡(CE))和学习方法(如虚拟博弈(FP))均针对此问题提出。为便于表述,我们将这些方法统称为"非精确求解器"(简称求解器)。然而,替代解与纳什均衡存在本质差异,且学习方法通常无法收敛至纳什均衡。为此,本文提出强化纳什均衡求解器(RENES),通过训练单一策略对不同规模博弈进行修正,并在修正后博弈中应用求解器,最终将所得解用于原始博弈评估。具体贡献包括三方面:i) 将博弈表示为$\alpha$-秩响应图,利用图神经网络(GNN)处理不同规模的博弈输入;ii) 通过张量分解(如典型多项分解(CP))实现不同规模博弈修正动作维度的固定;iii) 采用广泛使用的近端策略优化(PPO)训练博弈修正策略,对修正后博弈调用求解器求解,并在原始博弈中评估所得解。在大规模标准型博弈上的大量实验表明,本方法能进一步提升不同求解器(即$\alpha$-秩、CE、FP和PRD)对纳什均衡的近似效果,且可泛化至未见博弈。