Mirror play (MP) is a well-accepted primal-dual multi-agent learning algorithm where all agents simultaneously implement mirror descent in a distributed fashion. The advantage of MP over vanilla gradient play lies in its usage of mirror maps that better exploit the geometry of decision domains. Despite extensive literature dedicated to the asymptotic convergence of MP to equilibrium, the understanding of the finite-time behavior of MP before reaching equilibrium is still rudimentary. To facilitate the study of MP's non-equilibrium performance, this work establishes an equivalence between MP's finite-time primal-dual path (mirror path) in monotone games and the closed-loop Nash equilibrium path of a finite-horizon differential game, referred to as mirror differential game (MDG). Our construction of MDG rests on the Brezis-Ekeland variational principle, and the stage cost functional for MDG is Fenchel coupling between MP's iterates and associated gradient updates. The variational interpretation of mirror path in static games as the equilibrium path in MDG holds in deterministic and stochastic cases. Such a variational interpretation translates the non-equilibrium studies of learning dynamics into a more tractable equilibrium analysis of dynamic games, as demonstrated in a case study on the Cournot game, where MP dynamics corresponds to a linear quadratic game.
翻译:镜像博弈(MP)是一种被广泛接受的原对偶多智能体学习算法,其中所有智能体以分布式方式同步实现镜像下降。相较于原始梯度博弈,MP的优势在于其利用镜像映射更充分地挖掘了决策域的几何结构。尽管已有大量文献致力于研究MP渐近收敛到均衡态的性质,但对其到达均衡前的有限时间行为的理解仍十分原始。为促进MP非均衡行为的研究,本文建立了单调博弈中MP有限时间原对偶路径(镜像路径)与有限时域微分博弈闭环纳什均衡路径之间的等价关系,并将此微分博弈称为镜像微分博弈(MDG)。MDG的构建基于Brezis-Ekeland变分原理,其阶段代价泛函为MP迭代变量与关联梯度更新之间的Fenchel耦合。静态博弈中镜像路径作为MDG均衡路径的变分解释在确定性和随机情形下均成立。这种变分解释将学习动力学的非均衡分析转化为更易处理的动态博弈均衡分析,正如在古诺博弈案例研究中所展示的,其中MP动力学对应一个线性二次型博弈。