SkyDreamer：基于模型强化学习的可解释端到端视觉无人机竞速 (SkyDreamer: Interpretable End-to-End Vision-Based Drone Racing with Model-Based Reinforcement Learning)

Autonomous drone racing (ADR) systems have recently achieved champion-level performance, yet remain highly specific to drone racing. While end-to-end vision-based methods promise broader applicability, no system to date simultaneously achieves full sim-to-real transfer, onboard execution, and champion-level performance. In this work, we present SkyDreamer, to the best of our knowledge, the first end-to-end vision-based ADR policy that maps directly from pixel-level representations to motor commands. SkyDreamer builds on informed Dreamer, a model-based reinforcement learning approach where the world model decodes to privileged information only available during training. By extending this concept to end-to-end vision-based ADR, the world model effectively functions as an implicit state and parameter estimator, greatly improving interpretability. SkyDreamer runs fully onboard without external aid, resolves visual ambiguities by tracking progress using the state decoded from the world model's hidden state, and requires no extrinsic camera calibration, enabling rapid deployment across different drones without retraining. Real-world experiments show that SkyDreamer achieves robust, high-speed flight, executing tight maneuvers such as an inverted loop, a split-S and a ladder, reaching speeds of up to 21 m/s and accelerations of up to 6 g. It further demonstrates a non-trivial visual sim-to-real transfer by operating on poor-quality segmentation masks, and exhibits robustness to battery depletion by accurately estimating the maximum attainable motor RPM and adjusting its flight path in real-time. These results highlight SkyDreamer's adaptability to important aspects of the reality gap, bringing robustness while still achieving extremely high-speed, agile flight.

翻译：自主无人机竞速系统近期已实现冠军级性能，但其设计仍高度专用于竞速场景。尽管端到端视觉方法展现出更广泛的适用潜力，但迄今尚无系统能同时实现完整的仿真到现实迁移、机载执行与冠军级性能。本研究提出SkyDreamer，据我们所知，这是首个从像素级表征直接映射至电机指令的端到端视觉自主无人机竞速策略。SkyDreamer基于informed Dreamer框架构建，这是一种基于模型的强化学习方法，其世界模型可解码为仅在训练阶段可用的特权信息。通过将该概念扩展至端到端视觉自主无人机竞速，世界模型有效发挥了隐式状态与参数估计器的作用，显著提升了系统可解释性。SkyDreamer完全在机载设备上独立运行，通过世界模型隐状态解码的状态信息追踪飞行进度以解决视觉歧义，且无需外部相机标定，可在不同无人机平台快速部署而无需重新训练。真实环境实验表明，SkyDreamer实现了鲁棒的高速飞行，成功执行倒置筋斗、分离S机动与阶梯机动等紧凑飞行动作，最高速度达21 m/s，最大加速度达6 g。系统进一步通过处理低质量分割掩码展示了非平凡的视觉仿真到现实迁移能力，并通过对最大可达电机转速的精确估计及实时飞行路径调整，展现出应对电池耗尽的鲁棒性。这些结果彰显了SkyDreamer对现实差距重要维度的适应能力，在保持极端高速敏捷飞行的同时实现了系统鲁棒性。