The Multi-Agent Path Finding (MAPF) problem entails finding collision-free paths for a set of agents, guiding them from their start to goal locations. However, MAPF does not account for several practical task-related constraints. For example, agents may need to perform actions at goal locations with specific execution times, adhering to predetermined orders and timeframes. Moreover, goal assignments may not be predefined for agents, and the optimization objective may lack an explicit definition. To incorporate task assignment, path planning, and a user-defined objective into a coherent framework, this paper examines the Task Assignment and Path Finding with Precedence and Temporal Constraints (TAPF-PTC) problem. We augment Conflict-Based Search (CBS) to simultaneously generate task assignments and collision-free paths that adhere to precedence and temporal constraints, maximizing an objective quantified by the return from a user-defined reward function in reinforcement learning (RL). Experimentally, we demonstrate that our algorithm, CBS-TA-PTC, can solve highly challenging bomb-defusing tasks with precedence and temporal constraints efficiently relative to MARL and adapted Target Assignment and Path Finding (TAPF) methods.
翻译:多智能体路径规划(MAPF)问题旨在为多个智能体寻找无碰撞路径,引导它们从起点至目标位置。然而,MAPF并未考虑多项实际任务相关约束。例如,智能体可能需在目标位置以特定执行时长、遵循预定顺序与时间范围执行动作。此外,智能体的目标分配可能未被预先定义,且优化目标可能缺少明确界定。为将任务分配、路径规划及用户自定义目标整合至统一框架,本文研究了带优先级与时间约束的任务分配与路径规划(TAPF-PTC)问题。我们对基于冲突搜索(CBS)进行扩展,以同步生成满足优先级与时间约束的任务分配与无碰撞路径,并通过强化学习(RL)中用户定义奖励函数的回报量化目标最大化。实验表明,相对于多智能体强化学习(MARL)及改进的目标分配与路径规划(TAPF)方法,我们的算法CBS-TA-PTC能够高效解决具有优先级与时间约束的高难度拆弹任务。