Confidence-Based Curriculum Learning for Multi-Agent Path Finding

A wide range of real-world applications can be formulated as Multi-Agent Path Finding (MAPF) problem, where the goal is to find collision-free paths for multiple agents with individual start and goal locations. State-of-the-art MAPF solvers are mainly centralized and depend on global information, which limits their scalability and flexibility regarding changes or new maps that would require expensive replanning. Multi-agent reinforcement learning (MARL) offers an alternative way by learning decentralized policies that can generalize over a variety of maps. While there exist some prior works that attempt to connect both areas, the proposed techniques are heavily engineered and very complex due to the integration of many mechanisms that limit generality and are expensive to use. We argue that much simpler and general approaches are needed to bring the areas of MARL and MAPF closer together with significantly lower costs. In this paper, we propose Confidence-based Auto-Curriculum for Team Update Stability (CACTUS) as a lightweight MARL approach to MAPF. CACTUS defines a simple reverse curriculum scheme, where the goal of each agent is randomly placed within an allocation radius around the agent's start location. The allocation radius increases gradually as all agents improve, which is assessed by a confidence-based measure. We evaluate CACTUS in various maps of different sizes, obstacle densities, and numbers of agents. Our experiments demonstrate better performance and generalization capabilities than state-of-the-art MARL approaches with less than 600,000 trainable parameters, which is less than 5% of the neural network size of current MARL approaches to MAPF.

翻译：诸多现实世界应用可归结为多智能体路径规划（MAPF）问题，其目标是为具有各自起点和终点的多个智能体寻找无碰撞路径。当前最先进的MAPF求解器主要采用集中式方法且依赖全局信息，这限制了其可扩展性及应对变化或新地图的灵活性——后者需要代价高昂的重新规划。多智能体强化学习（MARL）提供了另一种途径，通过学习可泛化至多种地图的分布式策略。尽管已有部分前期研究试图连接这两个领域，但这些技术因整合大量机制而高度工程化且极为复杂，导致通用性受限且使用成本高昂。我们认为，需要更简单通用的方法以显著降低成本来拉近MARL与MAPF领域的距离。本文提出基于置信度的自动课程学习团队更新稳定性（CACTUS）方法，作为面向MAPF的轻量级MARL方案。CACTUS定义了简单的逆向课程机制：每个智能体的目标在其起点周围分配半径内随机放置，该半径随所有智能体能力提升而逐步扩大——这种提升通过基于置信度的度量进行评估。我们在不同尺寸、障碍密度及智能体数量的多样化地图中评估了CACTUS。实验表明，该方法在参数少于60万（不足当前MAPF领域MARL方法神经网络规模的5%）的条件下，性能与泛化能力均超越现有最先进的MARL方法。