Learning visuomotor policies for agile quadrotor flight presents significant difficulties, primarily from inefficient policy exploration caused by high-dimensional visual inputs and the need for precise and low-latency control. To address these challenges, we propose a novel approach that combines the performance of Reinforcement Learning (RL) and the sample efficiency of Imitation Learning (IL) in the task of vision-based autonomous drone racing. While RL provides a framework for learning high-performance controllers through trial and error, it faces challenges with sample efficiency and computational demands due to the high dimensionality of visual inputs. Conversely, IL efficiently learns from visual expert demonstrations, but it remains limited by the expert's performance and state distribution. To overcome these limitations, our policy learning framework integrates the strengths of both approaches. Our framework contains three phases: training a teacher policy using RL with privileged state information, distilling it into a student policy via IL, and adaptive fine-tuning via RL. Testing in both simulated and real-world scenarios shows our approach can not only learn in scenarios where RL from scratch fails but also outperforms existing IL methods in both robustness and performance, successfully navigating a quadrotor through a race course using only visual information.
翻译:学习用于敏捷四旋翼飞行的视觉运动策略面临显著困难,主要源于高维视觉输入导致的策略探索效率低下,以及对精确低延迟控制的需求。为解决这些挑战,我们提出了一种新颖方法,将强化学习(RL)的性能优势与模仿学习(IL)的样本效率优势相结合,应用于基于视觉的自主无人机竞速任务。虽然RL通过试错为学习高性能控制器提供了框架,但由于视觉输入的高维度,其在样本效率和计算需求方面面临挑战。相反,IL能够高效地从视觉专家演示中学习,但其性能受限于专家的水平与状态分布。为克服这些局限,我们的策略学习框架整合了两种方法的优势。该框架包含三个阶段:利用具有特权状态信息的RL训练教师策略,通过IL将其蒸馏为学生策略,以及通过RL进行自适应微调。在仿真和真实场景中的测试表明,我们的方法不仅能在从头开始的RL失败的情况下成功学习,而且在鲁棒性和性能上均超越了现有IL方法,成功实现了仅使用视觉信息引导四旋翼穿越竞速赛道。