Recently, neural control policies have outperformed existing model-based planning-and-control methods for autonomously navigating quadrotors through cluttered environments in minimum time. However, they are not perception aware, a crucial requirement in vision-based navigation due to the camera's limited field of view and the underactuated nature of a quadrotor. We propose a learning-based system that achieves perception-aware, agile flight in cluttered environments. Our method combines imitation learning with reinforcement learning (RL) by leveraging a privileged learning-by-cheating framework. Using RL, we first train a perception-aware teacher policy with full-state information to fly in minimum time through cluttered environments. Then, we use imitation learning to distill its knowledge into a vision-based student policy that only perceives the environment via a camera. Our approach tightly couples perception and control, showing a significant advantage in computation speed (10 times faster) and success rate. We demonstrate the closed-loop control performance using hardware-in-the-loop simulation.
翻译:最近,神经控制策略在通过杂乱环境自主导航四旋翼无人机并实现最小时间飞行方面,已超越现有的基于模型的规划与控制方法。然而,这些策略缺乏感知感知能力——由于相机有限的视场角和四旋翼的欠驱动特性,这一能力在基于视觉的导航中至关重要。我们提出了一种基于学习的系统,能在杂乱环境中实现感知感知的敏捷飞行。该方法通过利用特权学习作弊框架,将模仿学习与强化学习相结合。首先,我们使用强化学习训练一个具有全状态信息的感知感知教师策略,使其能以最小时间在杂乱环境中飞行。接着,通过模仿学习将其知识蒸馏到一个仅通过相机感知环境的基于视觉的学生策略中。我们的方法紧密耦合了感知与控制,在计算速度(快10倍)和成功率上展现出显著优势。我们通过硬件在环仿真验证了其闭环控制性能。