Learning Agile Quadrotor Flight in the Real World

Learning-based controllers have achieved impressive performance in agile quadrotor flight but typically rely on massive training in simulation, necessitating accurate system identification for effective Sim2Real transfer. However, even with precise modeling, fixed policies remain susceptible to out-of-distribution scenarios, ranging from external aerodynamic disturbances to internal hardware degradation. To ensure safety under these evolving uncertainties, such controllers are forced to operate with conservative safety margins, inherently constraining their agility outside of controlled settings. While online adaptation offers a potential remedy, safely exploring physical limits remains a critical bottleneck due to data scarcity and safety risks. To bridge this gap, we propose a self-adaptive framework that eliminates the need for precise system identification or offline Sim2Real transfer. We introduce Adaptive Temporal Scaling (ATS) to actively explore platform physical limits, and employ online residual learning to augment a simple nominal model. {Based on the learned hybrid model, we further propose Real-world Anchored Short-horizon Backpropagation Through Time (RASH-BPTT) to achieve efficient and robust in-flight policy updates. Extensive experiments demonstrate that our quadrotor reliably executes agile maneuvers near actuator saturation limits. The system evolves a conservative base policy with a peak speed of 1.9 m/s to 7.3 m/s within approximately 100 seconds of flight time. These findings underscore that real-world adaptation serves not merely to compensate for modeling errors, but as a practical mechanism for sustained performance improvement in aggressive flight regimes.

翻译：基于学习的控制器已在敏捷四旋翼飞行中取得了令人瞩目的性能，但通常依赖于在仿真中进行大规模训练，这需要精确的系统辨识以实现有效的仿真到现实迁移。然而，即使模型精确，固定策略在面对分布外场景时——从外部空气动力扰动到内部硬件退化——依然脆弱。为了在这些不断变化的不确定性下确保安全，此类控制器被迫以保守的安全裕度运行，这本质上限制了其在受控环境之外的敏捷性。尽管在线适应提供了一种潜在的补救措施，但由于数据稀缺和安全风险，安全地探索物理极限仍然是一个关键瓶颈。为弥合这一差距，我们提出了一种自适应框架，该框架无需精确的系统辨识或离线的仿真到现实迁移。我们引入了自适应时间缩放来主动探索平台的物理极限，并采用在线残差学习来增强一个简单的名义模型。基于学习到的混合模型，我们进一步提出了现实世界锚定的短时域时间反向传播，以实现高效且稳健的飞行中策略更新。大量实验表明，我们的四旋翼飞行器能够在执行器饱和极限附近可靠地执行敏捷机动。该系统将峰值速度从1.9米/秒的保守基础策略，在大约100秒的飞行时间内，演进至7.3米/秒。这些发现强调，现实世界适应不仅是为了补偿建模误差，更是实现激进飞行状态下持续性能提升的一种实用机制。