We investigate the problem of decentralized multi-agent navigation tasks, where multiple agents need to reach initially unassigned targets in a limited time. Classical planning-based methods suffer from expensive computation overhead at each step and offer limited expressiveness for complex cooperation strategies. In contrast, reinforcement learning (RL) has recently become a popular paradigm for addressing this issue. However, RL struggles with low data efficiency and cooperation when directly exploring (nearly) optimal policies in the large search space, especially with an increased agent number (e.g., 10+ agents) or in complex environments (e.g., 3D simulators). In this paper, we propose Multi-Agent Scalable GNN-based P lanner (MASP), a goal-conditioned hierarchical planner for navigation tasks with a substantial number of agents. MASP adopts a hierarchical framework to divide a large search space into multiple smaller spaces, thereby reducing the space complexity and accelerating training convergence. We also leverage graph neural networks (GNN) to model the interaction between agents and goals, improving goal achievement. Besides, to enhance generalization capabilities in scenarios with unseen team sizes, we divide agents into multiple groups, each with a previously trained number of agents. The results demonstrate that MASP outperforms classical planning-based competitors and RL baselines, achieving a nearly 100% success rate with minimal training data in both multi-agent particle environments (MPE) with 50 agents and a quadrotor 3-dimensional environment (OmniDrones) with 20 agents. Furthermore, the learned policy showcases zero-shot generalization across unseen team sizes.
翻译:我们研究了分散式多智能体导航任务问题,其中多个智能体需在有限时间内到达初始未分配的目标位置。传统基于规划的方法在每一步计算开销大,且对复杂协作策略的表达能力有限。相比之下,强化学习(RL)近期成为解决该问题的流行范式。然而,当直接在大搜索空间中探索(近)最优策略时,RL面临数据效率低和协作困难的问题,尤其在智能体数量增加(如10个以上)或复杂环境(如3D模拟器)中。本文提出多智能体可扩展图神经网络规划器(MASP),一种面向大量智能体导航任务的目标条件式分层规划器。MASP采用分层框架将大搜索空间划分为多个小子空间,从而降低空间复杂度并加速训练收敛。我们同时利用图神经网络(GNN)建模智能体与目标之间的交互,提升目标达成率。此外,为增强在未见团队规模场景中的泛化能力,我们将智能体划分为多个组,每组包含预先训练好的智能体数量。实验结果表明,在50个智能体的多智能体粒子环境(MPE)和20个智能体的四旋翼3D环境(OmniDrones)中,MASP以最小训练数据量实现了近100%的成功率,显著优于传统规划方法和RL基线。此外,所学策略在未见团队规模场景中展现出零样本泛化能力。