Learning to Fly in Seconds

Learning-based methods, particularly Reinforcement Learning (RL), hold great promise for streamlining deployment, enhancing performance, and achieving generalization in the control of autonomous multirotor aerial vehicles. Deep RL has been able to control complex systems with impressive fidelity and agility in simulation but the simulation-to-reality transfer often brings a hard-to-bridge reality gap. Moreover, RL is commonly plagued by prohibitively long training times. In this work, we propose a novel asymmetric actor-critic-based architecture coupled with a highly reliable RL-based training paradigm for end-to-end quadrotor control. We show how curriculum learning and a highly optimized simulator enhance sample complexity and lead to fast training times. To precisely discuss the challenges related to low-level/end-to-end multirotor control, we also introduce a taxonomy that classifies the existing levels of control abstractions as well as non-linearities and domain parameters. Our framework enables Simulation-to-Reality (Sim2Real) transfer for direct RPM control after only 18 seconds of training on a consumer-grade laptop as well as its deployment on microcontrollers to control a multirotor under real-time guarantees. Finally, our solution exhibits competitive performance in trajectory tracking, as demonstrated through various experimental comparisons with existing state-of-the-art control solutions using a real Crazyflie nano quadrotor. We open source the code including a very fast multirotor dynamics simulator that can simulate about 5 months of flight per second on a laptop GPU. The fast training times and deployment to a cheap, off-the-shelf quadrotor lower the barriers to entry and help democratize the research and development of these systems.

翻译：基于学习的方法，特别是强化学习（RL），在简化自主多旋翼飞行器的部署、提升性能及实现泛化控制方面展现出巨大潜力。深度强化学习已在仿真中以令人瞩目的保真度和敏捷性实现了对复杂系统的控制，但仿真到现实的迁移往往会带来难以弥合的现实差距。此外，强化学习通常受困于过长的训练时间。本文提出了一种新颖的非对称演员-评论家架构，并结合高可靠性的基于RL的训练范式，用于实现端到端四旋翼控制。我们展示了课程学习与高度优化的仿真器如何提升样本效率并缩短训练时间。为精确探讨与底层/端到端多旋翼控制相关的挑战，我们还引入了一种分类法，对现有的控制抽象层级、非线性特性及领域参数进行归类。我们的框架仅需在消费级笔记本电脑上训练18秒，即可实现直接转速控制的仿真到现实迁移，并能在微控制器上部署以在实时性保障下控制多旋翼。最后，通过使用真实的Crazyflie纳米四旋翼与多种现有先进控制方案进行实验对比，我们的解决方案在轨迹跟踪中展现出具有竞争力的性能。我们开源了代码，其中包括一个极快的多旋翼动力学仿真器，可在笔记本电脑GPU上每秒模拟约5个月的飞行时间。快速的训练时间及向廉价现成四旋翼的部署降低了技术门槛，有助于推动此类系统研发的民主化。