We combine the effectiveness of Reinforcement Learning (RL) and the efficiency of Imitation Learning (IL) in the context of vision-based, autonomous drone racing. We focus on directly processing visual input without explicit state estimation. While RL offers a general framework for learning complex controllers through trial and error, it faces challenges regarding sample efficiency and computational demands due to the high dimensionality of visual inputs. Conversely, IL demonstrates efficiency in learning from visual demonstrations but is limited by the quality of those demonstrations and faces issues like covariate shift. To overcome these limitations, we propose a novel training framework combining RL and IL's advantages. Our framework involves three stages: initial training of a teacher policy using privileged state information, distilling this policy into a student policy using IL, and performance-constrained adaptive RL fine-tuning. Our experiments in both simulated and real-world environments demonstrate that our approach achieves superior performance and robustness than IL or RL alone in navigating a quadrotor through a racing course using only visual information without explicit state estimation.
翻译:我们结合了强化学习(RL)的有效性与模仿学习(IL)的效率,应用于基于视觉的自主无人机竞速场景,重点处理直接来自视觉输入的信号,无需显式状态估计。尽管RL通过试错学习复杂控制器的通用框架具有优势,但视觉输入的高维性导致其在样本效率和计算开销方面面临挑战。相反,IL在从视觉演示中学习时展现出高效性,但其性能受限于演示质量,且面临协变量偏移等问题。为解决这些局限,我们提出了一种融合RL与IL优势的新型训练框架。该框架包含三个阶段:首先利用特权状态信息训练教师策略,其次通过IL将该策略蒸馏为学生策略,最后进行性能约束的自适应RL微调。在仿真与真实环境中的实验表明,与单独使用IL或RL相比,我们的方法在仅依赖视觉信息且无需显式状态估计的条件下,能够使四旋翼飞行器在竞速赛道中展现出更优的性能与鲁棒性。