Safe navigation of drones in the presence of adversarial physical attacks from multiple pursuers is a challenging task. This paper proposes a novel approach, asynchronous multi-stage deep reinforcement learning (AMS-DRL), to train adversarial neural networks that can learn from the actions of multiple evolved pursuers and adapt quickly to their behavior, enabling the drone to avoid attacks and reach its target. Specifically, AMS-DRL evolves adversarial agents in a pursuit-evasion game where the pursuers and the evader are asynchronously trained in a bipartite graph way during multiple stages. Our approach guarantees convergence by ensuring Nash equilibrium among agents from the game-theory analysis. We evaluate our method in extensive simulations and show that it outperforms baselines with higher navigation success rates. We also analyze how parameters such as the relative maximum speed affect navigation performance. Furthermore, we have conducted physical experiments and validated the effectiveness of the trained policies in real-time flights. A success rate heatmap is introduced to elucidate how spatial geometry influences navigation outcomes. Project website: https://github.com/NTU-ICG/AMS-DRL-for-Pursuit-Evasion.
翻译:在存在多个追击者进行物理对抗攻击的情况下,实现无人机的安全导航是一项具有挑战性的任务。本文提出了一种新颖的方法——异步多阶段深度强化学习(AMS-DRL),用于训练对抗性神经网络,使其能够从多个进化的追击者行为中学习并快速适应其行动,从而使无人机能够规避攻击并抵达目标。具体而言,AMS-DRL在追逃博弈中进化对抗性智能体,其中追击者与逃逸者在多个阶段内以二分图方式异步训练。通过博弈论分析,我们的方法通过确保智能体间纳什均衡来保证收敛性。我们在大量仿真中评估了该方法,结果表明其导航成功率优于基线方法。我们还分析了相对最大速度等参数对导航性能的影响。此外,我们进行了物理实验,验证了训练策略在实时飞行中的有效性。引入成功率热力图以阐明空间几何结构如何影响导航结果。项目网站:https://github.com/NTU-ICG/AMS-DRL-for-Pursuit-Evasion