Multi-Agent Reinforcement Learning (MARL) based Multi-Agent Path Finding (MAPF) has recently gained attention due to its efficiency and scalability. Several MARL-MAPF methods choose to use communication to enrich the information one agent can perceive. However, existing works still struggle in structured environments with high obstacle density and a high number of agents. To further improve the performance of the communication-based MARL-MAPF solvers, we propose a new method, Ensembling Prioritized Hybrid Policies (EPH). We first propose a selective communication block to gather richer information for better agent coordination within multi-agent environments and train the model with a Q-learning-based algorithm. We further introduce three advanced inference strategies aimed at bolstering performance during the execution phase. First, we hybridize the neural policy with single-agent expert guidance for navigating conflict-free zones. Secondly, we propose Q value-based methods for prioritized resolution of conflicts as well as deadlock situations. Finally, we introduce a robust ensemble method that can efficiently collect the best out of multiple possible solutions. We empirically evaluate EPH in complex multi-agent environments and demonstrate competitive performance against state-of-the-art neural methods for MAPF.
翻译:基于多智能体强化学习(MARL)的多智能体路径规划(MAPF)因其高效性和可扩展性近期受到关注。部分MARL-MAPF方法选择通过通信来丰富单个智能体可感知的信息。然而,现有方法在障碍物密度高、智能体数量多的结构化环境中仍面临挑战。为进一步提升基于通信的MARL-MAPF求解器的性能,我们提出一种新方法——集成优先混合策略(EPH)。首先,我们提出一个选择性通信模块,用于收集更丰富的信息以增强多智能体环境中的协调能力,并采用基于Q学习的算法训练模型。进一步,我们引入三种先进推理策略以提升执行阶段的性能:其一,将神经策略与单智能体专家引导混合,用于导航无冲突区域;其二,提出基于Q值的方法优先解决冲突及死锁情况;其三,引入鲁棒的集成方法,高效地从多个可行解中选取最优解。我们在复杂多智能体环境中对EPH进行实证评估,结果表明其性能可与当前最先进的MAPF神经方法相媲美。