SOUS VIDE：在 Gaussian Splatting 真空中“烹饪”视觉无人机导航策略 (SOUS VIDE: Cooking Visual Drone Navigation Policies in a Gaussian Splatting Vacuum)

We propose a new simulator, training approach, and policy architecture, collectively called SOUS VIDE, for end-to-end visual drone navigation. Our trained policies exhibit zero-shot sim-to-real transfer with robust real-world performance using only on-board perception and computation. Our simulator, called FiGS, couples a computationally simple drone dynamics model with a high visual fidelity Gaussian Splatting scene reconstruction. FiGS can quickly simulate drone flights producing photorealistic images at up to 130 fps. We use FiGS to collect 100k-300k observation-action pairs from an expert MPC with privileged state and dynamics information, randomized over dynamics parameters and spatial disturbances. We then distill this expert MPC into an end-to-end visuomotor policy with a lightweight neural architecture, called SV-Net. SV-Net processes color image, optical flow and IMU data streams into low-level body rate and thrust commands at 20Hz onboard a drone. Crucially, SV-Net includes a Rapid Motor Adaptation (RMA) module that adapts at runtime to variations in drone dynamics. In a campaign of 105 hardware experiments, we show SOUS VIDE policies to be robust to 30% mass variations, 40 m/s wind gusts, 60% changes in ambient brightness, shifting or removing objects from the scene, and people moving aggressively through the drone's visual field. Code, data, and experiment videos can be found on our project page: https://stanfordmsl.github.io/SousVide/.

翻译：我们提出了一种名为 SOUS VIDE 的新模拟器、训练方法和策略架构，用于端到端的视觉无人机导航。我们训练出的策略仅使用机载感知和计算，便展现出零样本的仿真到现实迁移能力，并在现实世界中具有鲁棒的性能。我们的模拟器名为 FiGS，它将一个计算简单的无人机动力学模型与高视觉保真度的 Gaussian Splatting 场景重建相结合。FiGS 可以快速模拟无人机飞行，以高达 130 帧/秒的速度生成逼真的图像。我们使用 FiGS 从一个拥有特权状态和动力学信息的专家模型预测控制器（MPC）中收集了 10 万至 30 万个观测-动作对，并随机化了动力学参数和空间扰动。然后，我们将这个专家 MPC 提炼成一个端到端的视觉运动策略，其采用一个轻量级的神经架构，称为 SV-Net。SV-Net 以 20Hz 的频率在无人机上处理彩色图像、光流和 IMU 数据流，并输出低级的机体角速率和推力指令。至关重要的是，SV-Net 包含一个快速电机适应（RMA）模块，可在运行时适应无人机动力学的变化。在一系列 105 次硬件实验中，我们展示了 SOUS VIDE 策略能够鲁棒地应对 30% 的质量变化、40 米/秒的阵风、60% 的环境亮度变化、场景中物体的移动或移除，以及人员在其视觉范围内剧烈移动等挑战。代码、数据和实验视频可在我们的项目页面找到：https://stanfordmsl.github.io/SousVide/。