This paper addresses the problem of multi-agent pursuit, where slow pursuers cooperate to capture fast evaders in a confined environment with obstacles. Existing heuristic algorithms often lack expressive coordination strategies and are highly sensitive to task conditions, requiring extensive hyperparameter tuning. In contrast, reinforcement learning (RL) has been applied to this problem and is capable of obtaining cooperative pursuit strategies. However, RL-based methods face challenges in training for complex scenarios due to the vast amount of training data and limited adaptability to varying task conditions, such as different scene sizes, varying numbers and speeds of obstacles, and flexible speed ratios of the evader to the pursuer. In this work, we combine RL and curriculum learning to introduce a flexible solver for multiagent pursuit problems, named TaskFlex Solver (TFS), which is capable of solving multi-agent pursuit problems with diverse and dynamically changing task conditions in both 2-dimensional and 3-dimensional scenarios. TFS utilizes a curriculum learning method that constructs task distributions based on training progress, enhancing training efficiency and final performance. Our algorithm consists of two main components: the Task Evaluator, which evaluates task success rates and selects tasks of moderate difficulty to maintain a curriculum archive, and the Task Sampler, which constructs training distributions by sampling tasks from the curriculum archive to maximize policy improvement. Experiments show that TFS produces much stronger performance than baselines and achieves close to 100% capture rates in both 2-dimensional and 3-dimensional multi-agent pursuit problems with diverse and dynamically changing scenes. The project website is at https://sites.google.com/view/tfs-2023.
翻译:本文研究了多智能体追捕问题,即多个慢速追捕者在具有障碍物的封闭环境中协作捕获快速逃逸者。现有启发式算法通常缺乏表达力强的合作策略,且高度依赖任务条件,需大量超参数调优。相比之下,强化学习(RL)已被应用于该问题,并能获得合作追捕策略。然而,基于RL的方法在复杂场景训练中面临挑战,原因在于海量训练数据需求以及对多样化任务条件(如不同场景尺寸、障碍物数量与速度变化、逃逸者与追捕者速度比的灵活性)的适应能力有限。本研究结合RL与课程学习,提出了一种灵活的多智能体追捕求解器——TaskFlex Solver(TFS),能够求解二维及三维场景中具有多样且动态变化任务条件的多智能体追捕问题。TFS采用课程学习方法,基于训练进度构建任务分布,从而提升训练效率与最终性能。我们的算法包含两个核心组件:任务评估器(Task Evaluator)——评估任务成功率并选取中等难度任务以维护课程档案;任务采样器(Task Sampler)——从课程档案中采样任务以构建训练分布,最大化策略改进。实验表明,在二维和三维多智能体追捕问题中,面对多样动态场景,TFS的性能远超基线方法,捕获率接近100%。项目网站见https://sites.google.com/view/tfs-2023。