We propose the concept of a Lagrangian game to solve constrained Markov games. Such games model scenarios where agents face cost constraints in addition to their individual rewards, that depend on both agent joint actions and the evolving environment state over time. Constrained Markov games form the formal mechanism behind safe multiagent reinforcement learning, providing a structured model for dynamic multiagent interactions in a multitude of settings, such as autonomous teams operating under local energy and time constraints, for example. We develop a primal-dual approach in which agents solve a Lagrangian game associated with the current Lagrange multiplier, simulate cost and reward trajectories over a fixed horizon, and update the multiplier using accrued experience. This update rule generates a new Lagrangian game, initiating the next iteration. Our key result consists in showing that the sequence of solutions to these Lagrangian games yields a nonstationary Nash solution for the original constrained Markov game.
翻译:我们提出拉格朗日博弈的概念以求解约束马尔可夫博弈。这类博弈模拟智能体在个体奖励之外还面临成本约束的场景,这些约束取决于智能体的联合行动以及随时间演化的环境状态。约束马尔可夫博弈构成了安全多智能体强化学习的理论基础,为多种场景下的动态多智能体交互提供了结构化模型,例如在局部能量与时间约束下运行的自主团队。我们提出一种原对偶方法:智能体求解与当前拉格朗日乘子关联的拉格朗日博弈,在固定时间范围内模拟成本与奖励轨迹,并利用累积经验更新乘子。该更新规则生成新的拉格朗日博弈,从而启动下一轮迭代。我们的核心成果在于证明:这一系列拉格朗日博弈的解序列能够为原始约束马尔可夫博弈生成非平稳纳什解。