An $α$-potential game is a multi-player non-cooperative interaction in which a global potential function approximates individual player rewards up to a structural bias $α$. While identifying a Nash Equilibrium (NE) in generic general-sum games is known to be computationally intractable, the potential game structure enables tractable NE identification. In this paper, we study the offline learning of NE in $α$-potential games using KL regularization. To analyze this process, we propose a novel Reference-Anchored offline data coverage framework--a verifiable condition that anchors data requirements to a known reference policy rather than an unknown optimum. Building on this, we propose Offline Potential Mirror Descent (OPMD), a decentralized algorithm that achieves an accelerated $\widetilde{\mathcal{O}}(1/n)$ statistical rate, surpassing the standard $\widetilde{\mathcal{O}}(1/\sqrt{n})$ rate typical of offline multi-agent learning. This work characterizes the first fast-rate offline learning approach for $α$-potential games.
翻译:摘要:$α$-势博弈是一种多人非合作交互,其中全局势函数在结构偏差$α$的范围内近似个体玩家收益。尽管在一般和博弈中识别纳什均衡(NE)已知在计算上难以处理,但势博弈结构使得NE的识别易于处理。在本文中,我们研究使用KL正则化在$α$-势博弈中离线学习NE。为分析这一过程,我们提出了一种新颖的参考锚定离线数据覆盖框架——一种可验证的条件,将数据需求锚定到已知参考策略而非未知最优策略。基于此,我们提出离线势镜像下降(OPMD),一种分布式算法,实现了加速的$\widetilde{\mathcal{O}}(1/n)$统计率,超越了离线多智能体学习中典型的$\widetilde{\mathcal{O}}(1/\sqrt{n})$率。本工作刻画了$α$-势博弈中首个快速率离线学习方法。