We formulate a Markov potential game with final-time reach-avoid objectives by integrating potential game theory with stochastic reach-avoid control. Our focus is on multi-player trajectory planning where players maximize the same multi-player reach-avoid objective: the probability of all participants reaching their designated target states by a specified time, while avoiding collisions with one another. Existing approaches require centralized computation of actions via a global policy, which may have prohibitively expensive communication costs. Instead, we focus on approximations of the global policy via local state feedback policies. First, we adapt the recursive single player reach-avoid value iteration to the multi-player framework with local policies, and show that the same recursion holds on the joint state space. To find each player's optimal local policy, the multi-player reach-avoid value function is projected from the joint state to the local state using the other players' occupancy measures. Then, we propose an iterative best response scheme for the multi-player value iteration to converge to a pure Nash equilibrium. We demonstrate the utility of our approach in finding collision-free policies for multi-player motion planning in simulation.
翻译:我们通过将势博弈理论与随机可达-规避控制相结合,构建了一个具有终时可达-规避目标的马尔可夫势博弈。我们的研究重点是多智能体轨迹规划问题,其中所有智能体最大化同一个多智能体可达-规避目标:即在指定时间内,所有参与者均抵达其指定目标状态,同时避免彼此发生碰撞的概率。现有方法需要通过全局策略集中计算行动,这可能产生难以承受的通信开销。相反,我们专注于通过局部状态反馈策略来逼近全局策略。首先,我们将递归式单智能体可达-规避值迭代方法适配到采用局部策略的多智能体框架中,并证明在联合状态空间上保持相同的递归关系。为求解每个智能体的最优局部策略,多智能体可达-规避值函数通过其他智能体的占用测度从联合状态投影到局部状态。随后,我们提出一种迭代最优响应方案,使多智能体值迭代收敛至纯纳什均衡。我们在仿真中通过多智能体运动规划验证了所提方法在求解无碰撞策略方面的有效性。