An $α$-potential game is a multi-player non-cooperative interaction in which a global potential function approximates individual player rewards up to a structural bias $α$. While identifying a Nash Equilibrium (NE) in generic general-sum games is known to be computationally intractable, the potential game structure enables tractable NE identification. In this paper, we study the offline learning of NE in $α$-potential games using KL regularization. To analyze this process, we propose a novel Reference-Anchored offline data coverage framework--a verifiable condition that anchors data requirements to a known reference policy rather than an unknown optimum. Building on this, we propose Offline Potential Mirror Descent (OPMD), a decentralized algorithm that achieves an accelerated $\widetilde{\mathcal{O}}(1/n)$ statistical rate, surpassing the standard $\widetilde{\mathcal{O}}(1/\sqrt{n})$ rate typical of offline multi-agent learning. This work characterizes the first fast-rate offline learning approach for $α$-potential games.
翻译:摘要:α-势博弈是一种多参与者非合作交互模型,其中全局势函数可在结构偏差α范围内近似个体参与者的收益。虽然已知通用和博弈的纳什均衡识别在计算上难以处理,但势博弈结构使得纳什均衡的可解性得以实现。本文研究使用KL正则化在α-势博弈中离线学习纳什均衡的问题。为分析该过程,我们提出了一种新颖的"参考锚定离线数据覆盖框架"——该可验证条件将数据需求锚定至已知参考策略而非未知最优策略。基于此,我们提出离线势镜像下降算法(OPMD),该分布式算法实现了加速的$\widetilde{\mathcal{O}}(1/n)$统计收敛率,超越了离线多智能体学习中典型的$\widetilde{\mathcal{O}}(1/\sqrt{n})$收敛率。本工作首次刻画了α-势博弈中离线学习的快速收敛方法。