Distributed Multi-Agent Path Finding (MAPF) integrated with Multi-Agent Reinforcement Learning (MARL) has emerged as a prominent research focus, enabling real-time cooperative decision-making in partially observable environments through inter-agent communication. However, due to insufficient collaborative and perceptual capabilities, existing methods are inadequate for scaling across diverse environmental conditions. To address these challenges, we propose PC2P, a novel distributed MAPF method derived from a Q-learning-based MARL framework. Initially, we introduce a personalized-enhanced communication mechanism based on dynamic graph topology, which ascertains the core aspects of ``who" and ``what" in interactive process through three-stage operations: selection, generation, and aggregation. Concurrently, we incorporate local crowd perception to enrich agents' heuristic observation, thereby strengthening the model's guidance for effective actions via the integration of static spatial constraints and dynamic occupancy changes. To resolve extreme deadlock issues, we propose a region-based deadlock-breaking strategy that leverages expert guidance to implement efficient coordination within confined areas. Experimental results demonstrate that PC2P achieves superior performance compared to state-of-the-art distributed MAPF methods in varied environments. Ablation studies further confirm the effectiveness of each module for overall performance.
翻译:分布式多智能体路径规划与多智能体强化学习的融合已成为重要研究方向,通过智能体间通信实现部分可观测环境下的实时协同决策。然而,由于协作与感知能力不足,现有方法难以适应多样化环境条件的扩展需求。为应对这些挑战,本文提出PC2P——一种基于Q学习多智能体强化学习框架的新型分布式MAPF方法。首先,我们引入基于动态图拓扑的个性化增强通信机制,通过选择、生成与聚合三阶段操作,明确交互过程中“与谁通信”及“传递何种信息”的核心问题。同时,我们融合局部群体感知以增强智能体的启发式观测,通过整合静态空间约束与动态占用变化,强化模型对有效行动的引导能力。为解决极端死锁问题,我们提出基于区域的死锁解除策略,利用专家指导在受限区域内实现高效协同。实验结果表明,在不同环境设置下,PC2P相较现有先进分布式MAPF方法均表现出更优性能。消融研究进一步验证了各模块对整体性能的有效贡献。