Multi-agent pursuit-evasion tasks involving intelligent targets are notoriously challenging coordination problems. In this paper, we investigate new ways to learn such coordinated behaviors of unmanned aerial vehicles (UAVs) aimed at keeping track of multiple evasive targets. Within a Multi-Agent Reinforcement Learning (MARL) framework, we specifically propose a variant of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) method. Our approach addresses multi-target pursuit-evasion scenarios within non-stationary and unknown environments with random obstacles. In addition, given the critical role played by collective exploration in terms of detecting possible targets, we implement heterogeneous roles for the pursuers for enhanced exploratory actions balanced by exploitation (i.e. tracking) of previously identified targets. Our proposed role-based MADDPG algorithm is not only able to track multiple targets, but also is able to explore for possible targets by means of the proposed Voronoi-based rewarding policy. We implemented, tested and validated our approach in a simulation environment prior to deploying a real-world multi-robot system comprising of Crazyflie drones. Our results demonstrate that a multi-agent pursuit team has the ability to learn highly efficient coordinated control policies in terms of target tracking and exploration even when confronted with multiple fast evasive targets in complex environments.
翻译:涉及智能目标的多智能体追捕-逃逸任务是众所周知的协调难题。本文研究如何学习无人机(UAV)追踪多个逃逸目标的协同行为。在多智能体强化学习(MARL)框架内,我们提出了一种多智能体深度确定性策略梯度(MADDPG)方法的变体。该方法针对非平稳、未知且存在随机障碍的环境中的多目标追捕-逃逸场景。此外,考虑到集体探索在检测潜在目标方面的关键作用,我们为追捕者实现了异构角色,以增强探索行为,同时平衡对已识别目标的利用(即追踪)。我们提出的基于角色的MADDPG算法不仅能追踪多个目标,还能通过所提出的基于Voronoi的奖励策略探索潜在目标。我们在模拟环境中实现、测试并验证了该方法,然后部署了由Crazyflie无人机组成的真实多机器人系统。结果表明,即使面对复杂环境中的多个快速逃逸目标,多智能体追捕团队也能学习到高度高效的协同控制策略,实现目标追踪与探索。