This paper addresses multi-UAV pursuit-evasion, where a group of drones cooperates to capture a fast evader in a confined environment with obstacles. Existing heuristic algorithms, which simplify the pursuit-evasion problem, often lack expressive coordination strategies and struggle to capture the evader in extreme scenarios, such as when the evader moves at high speeds. In contrast, reinforcement learning (RL) has been applied to this problem and has the potential to obtain highly cooperative capture strategies. However, RL-based methods face challenges in training for complex 3-dimensional scenarios with diverse task settings due to the vast exploration space. The dynamics constraints of drones further restrict the ability of reinforcement learning to acquire high-performance capture strategies. In this work, we introduce a dual curriculum learning framework, named DualCL, which addresses multi-UAV pursuit-evasion in diverse environments and demonstrates zero-shot transfer ability to unseen scenarios. DualCL comprises two main components: the Intrinsic Parameter Curriculum Proposer, which progressively suggests intrinsic parameters from easy to hard to improve the capture capability of drones, and the External Environment Generator, tasked with exploring unresolved scenarios and generating appropriate training distributions of external environment parameters. The simulation experimental results show that DualCL significantly outperforms baseline methods, achieving over 90% capture rate and reducing the capture timestep by at least 27.5% in the training scenarios. Additionally, it exhibits the best zero-shot generalization ability in unseen environments. Moreover, we demonstrate the transferability of our pursuit strategy from simulation to real-world environments. Further details can be found on the project website at https://sites.google.com/view/dualcl.
翻译:本文研究多无人机追捕-规避问题,即一组无人机在存在障碍物的受限环境中协同捕获高速逃逸目标。现有启发式算法简化了追捕-规避问题,但缺乏表达性协调策略,且在极端场景(如逃逸目标高速移动时)难以捕获目标。相比之下,强化学习已被应用于该问题并具备获得高度协同捕获策略的潜力。然而,基于强化学习的方法在复杂三维多任务场景中面临训练困难,原因在于探索空间过大。无人机的动力学约束进一步限制了强化学习获取高性能捕获策略的能力。本文提出名为DualCL的双课程学习框架,可在多样化环境中解决多无人机追捕-规避问题,并展现对未见场景的零样本迁移能力。DualCL包含两个核心组件:内部参数课程提议器(Intrinsic Parameter Curriculum Proposer)通过由易到难逐步调整内部参数以提升无人机捕获能力;外部环境生成器(External Environment Generator)负责探索未解决场景并生成合适的外部环境参数训练分布。仿真实验结果表明,DualCL性能显著优于基线方法,在训练场景中捕获率超过90%,捕获时间步长减少至少27.5%,并在未见环境中展现出最优的零样本泛化能力。此外,我们验证了追捕策略从仿真环境到真实环境的可迁移性。更多详情请见项目网站:https://sites.google.com/view/dualcl。