Convex Markov Games (cMGs) were recently introduced as a broad class of multi-agent learning problems that generalize Markov games to settings where strategic agents optimize general utilities beyond additive rewards. While cMGs expand the modeling frontier, their theoretical foundations, particularly the structure of Nash equilibria (NE) and guarantees for learning algorithms, are not yet well understood. In this work, we address these gaps for an extension of cMGs, which we term General Utility Markov Games (GUMGs), capturing new applications requiring coupling between agents' occupancy measures. We prove that in GUMGs, Nash equilibria coincide with the fixed points of projected pseudo-gradient dynamics (i.e., first-order stationary points), enabled by a novel agent-wise gradient domination property. This insight also yields a simple proof of NE existence using Brouwer's fixed-point theorem. We further show the existence of Markov perfect equilibria. Building on this characterization, we establish a policy gradient theorem for GUMGs and design a model-free policy gradient algorithm. For potential GUMGs, we establish iteration complexity guarantees for computing approximate-NE under exact gradients and provide sample complexity bounds in both the generative model and on-policy settings. Our results extend beyond prior work restricted to zero-sum cMGs, providing the first theoretical analysis of common-interest cMGs.
翻译:凸马尔可夫博弈(cMGs)是最近提出的一类广泛的多智能体学习问题,它将马尔可夫博弈推广到智能体优化超越加性奖励的一般性效用的场景。尽管cMGs扩展了建模前沿,但其理论基础,特别是纳什均衡(NE)的结构和学习算法的保证,尚未得到充分理解。在本工作中,我们针对cMGs的一个扩展(我们称之为一般效用马尔可夫博弈,GUMGs)解决了这些空白,该扩展捕捉了需要智能体占用测度之间耦合的新应用。我们证明,在GUMGs中,纳什均衡与投影伪梯度动力学(即一阶驻点)的不动点重合,这得益于一种新颖的按智能体梯度支配性质。这一见解也利用布劳威尔不动点定理给出了NE存在性的一个简单证明。我们进一步证明了马尔可夫完美均衡的存在性。基于此刻画,我们为GUMGs建立了一个策略梯度定理,并设计了一种无模型的策略梯度算法。对于势GUMGs,我们在精确梯度下建立了计算近似NE的迭代复杂度保证,并在生成模型和同策略设置下提供了样本复杂度界限。我们的结果超越了先前局限于零和cMGs的工作,首次对共同利益cMGs进行了理论分析。