Learning Graph-Enhanced Commander-Executor for Multi-Agent Navigation

This paper investigates the multi-agent navigation problem, which requires multiple agents to reach the target goals in a limited time. Multi-agent reinforcement learning (MARL) has shown promising results for solving this issue. However, it is inefficient for MARL to directly explore the (nearly) optimal policy in the large search space, which is exacerbated as the agent number increases (e.g., 10+ agents) or the environment is more complex (e.g., 3D simulator). Goal-conditioned hierarchical reinforcement learning (HRL) provides a promising direction to tackle this challenge by introducing a hierarchical structure to decompose the search space, where the low-level policy predicts primitive actions in the guidance of the goals derived from the high-level policy. In this paper, we propose Multi-Agent Graph-Enhanced Commander-Executor (MAGE-X), a graph-based goal-conditioned hierarchical method for multi-agent navigation tasks. MAGE-X comprises a high-level Goal Commander and a low-level Action Executor. The Goal Commander predicts the probability distribution of goals and leverages them to assign each agent the most appropriate final target. The Action Executor utilizes graph neural networks (GNN) to construct a subgraph for each agent that only contains crucial partners to improve cooperation. Additionally, the Goal Encoder in the Action Executor captures the relationship between the agent and the designated goal to encourage the agent to reach the final target. The results show that MAGE-X outperforms the state-of-the-art MARL baselines with a 100% success rate with only 3 million training steps in multi-agent particle environments (MPE) with 50 agents, and at least a 12% higher success rate and 2x higher data efficiency in a more complicated quadrotor 3D navigation task.

翻译：本文研究了多智能体导航问题，要求多个智能体在限定时间内到达目标位置。多智能体强化学习（MARL）在解决该问题上已展现出良好效果。然而，在大规模搜索空间中直接探索（近）最优策略对MARL而言效率低下，并且随着智能体数量增加（如10个以上智能体）或环境复杂度提升（如3D模拟器），这一问题更加突出。目标条件化分层强化学习（HRL）通过引入分层结构分解搜索空间，为应对这一挑战提供了有前景的方向——其中低层策略在高层次策略生成的目标指导下预测原始动作。本文提出多智能体图增强指挥官-执行器（MAGE-X），一种基于图的目标条件化分层方法，用于多智能体导航任务。MAGE-X由高层目标指挥官和低层动作执行器组成。目标指挥官预测目标的概率分布，并据此为每个智能体分配最合适的最终目标。动作执行器利用图神经网络（GNN）为每个智能体构建仅包含关键伙伴的子图以提升协作效率。此外，动作执行器中的目标编码器捕获智能体与指定目标之间的关系，以激励智能体抵达最终目标。实验结果表明，在包含50个智能体的多智能体粒子环境（MPE）中，MAGE-X仅需3百万步训练即实现100%成功率，超越现有最优MARL基线方法；在更复杂的四旋翼3D导航任务中，其成功率至少提升12%，数据效率提升2倍。