This article presents a solution to intercept an agile drone by another agile drone carrying a catching net. We formulate the interception as a Competitive Reinforcement Learning problem, where the interceptor and the target drone are controlled by separate policies trained with Proximal Policy Optimization (PPO). We introduce a high-fidelity simulation environment that integrates a realistic quadrotor dynamics model and a low-level control architecture implemented in JAX, which allows for fast parallelized execution on GPUs. We train the agents using low-level control, collective thrust and body rates, to achieve agile flights both for the interceptor and the target. We compare the performance of the trained policies in terms of catch rate, time to catch, and crash rate, against common heuristic baselines and show that our solution outperforms these baselines for interception of agile targets. Finally, we demonstrate the performance of the trained policies in a scaled real-world scenario using agile drones inside an indoor flight arena.
翻译:本文提出了一种利用携带捕捉网的敏捷无人机拦截另一架敏捷无人机的解决方案。我们将拦截问题建模为一个竞争强化学习问题,其中拦截器和目标无人机分别由通过近端策略优化算法训练的策略进行控制。我们引入了一个高保真仿真环境,该环境集成了真实的四旋翼动力学模型和一个在JAX中实现的底层控制架构,从而支持在GPU上进行快速并行化计算。我们使用底层控制指令——集体推力和机体角速率——训练智能体,使拦截器和目标均能实现敏捷飞行。我们通过捕获率、捕获时间和坠毁率等指标,将训练所得策略的性能与常见启发式基线方法进行比较,结果表明在拦截敏捷目标方面,我们的解决方案优于这些基线方法。最后,我们在室内飞行场中使用敏捷无人机,通过缩比现实场景验证了训练策略的性能。