The primary objective of Multi-Agent Pathfinding (MAPF) is to plan efficient and conflict-free paths for all agents. Traditional multi-agent path planning algorithms struggle to achieve efficient distributed path planning for multiple agents. In contrast, Multi-Agent Reinforcement Learning (MARL) has been demonstrated as an effective approach to achieve this objective. By modeling the MAPF problem as a MARL problem, agents can achieve efficient path planning and collision avoidance through distributed strategies under partial observation. However, MARL strategies often lack cooperation among agents due to the absence of global information, which subsequently leads to reduced MAPF efficiency. To address this challenge, this letter introduces a unique reward shaping technique based on Independent Q-Learning (IQL). The aim of this method is to evaluate the influence of one agent on its neighbors and integrate such an interaction into the reward function, leading to active cooperation among agents. This reward shaping method facilitates cooperation among agents while operating in a distributed manner. The proposed approach has been evaluated through experiments across various scenarios with different scales and agent counts. The results are compared with those from other state-of-the-art (SOTA) planners. The evidence suggests that the approach proposed in this letter parallels other planners in numerous aspects, and outperforms them in scenarios featuring a large number of agents.
翻译:多智能体路径规划(MAPF)的核心目标是为所有智能体规划高效且无冲突的路径。传统的多智能体路径规划算法难以实现多智能体的高效分布式路径规划。相比之下,多智能体强化学习(MARL)已被证明是实现该目标的有效途径。通过将MAPF问题建模为MARL问题,智能体能够在部分观测条件下通过分布式策略实现高效路径规划与碰撞规避。然而,由于缺乏全局信息,MARL策略往往难以实现智能体间的有效协作,进而导致MAPF效率降低。为应对这一挑战,本文提出了一种基于独立Q学习(IQL)的独特奖励塑形技术。该方法旨在评估单个智能体对其相邻智能体的影响,并将此类交互整合至奖励函数中,从而促进智能体间的主动协作。该奖励塑形方法在保持分布式运行的同时,有效促进了智能体间的合作。本文通过在不同规模与智能体数量的多种场景中进行实验验证所提方法,并将结果与其他前沿(SOTA)规划器进行对比。实验证据表明,本文提出的方法在多数性能指标上可与现有规划器媲美,并在大规模智能体场景中表现更优。