Guided Diffusion with Distilled Vision-Language Reliability for Aerial Navigation

Autonomous UAV navigation is conventionally solved by pipelines that separate perception, mapping, and planning into distinct stages, which propagates errors, accumulates latency, and requires environment-specific retuning. End-to-end generative models remove these interfaces by mapping raw observations directly to trajectories, but inherit a subtle failure mode: trained on clean data, they cannot recognise when an observation is unreliable, and treat degraded regions such as glass, mirrors, and overexposed surfaces as valid evidence for planning. We present a reliability-aware diffusion planner for 3D UAV navigation. It conditions trajectory generation on the observation together with a scene-level reliability heatmap that marks where perception cannot be trusted, produced by a lightweight network that distils the open-vocabulary reasoning of a vision-language model within the real-time planning budget. To generalise to unseen environments without retraining, we steer the denoising process with a differentiable two-stage ESDF cost that treats physical obstacles from depth and virtual obstacles from highly unreliable regions on equal footing. In simulation and on a real quadrotor, our planner produces markedly safer trajectories than a state-of-the-art diffusion baseline, reducing the obstacle-violation rate from 40.3% to 9.6% and raising the mean reliability of traversed regions from 0.588 to 0.925. Ablating the reliability term alone drops mean reliability from 0.898 to 0.783, confirming it as the decisive component, while distillation runs the framework up to 2 times faster than the full vision-language model.

翻译：自主无人机导航传统上通过将感知、建图和规划分解为独立阶段的管线来解决，这会传播误差、累积延迟，并需要针对特定环境重新调参。端到端生成模型通过将原始观测直接映射为轨迹消除了这些接口，但继承了微妙的失败模式：在干净数据上训练的模型无法识别观测不可靠的情况，并将玻璃、镜面和过曝表面等退化区域视为有效证据用于规划。我们提出了一种适用于三维无人机导航的可靠性感知扩散规划器。它根据观测和场景级可靠性热图来约束轨迹生成，该热图标记了感知不可信的区域，由轻量级网络在实时规划预算内通过蒸馏视觉语言模型的开集推理能力生成。为了无需重新训练即可泛化至未知环境，我们采用可微分的两阶段ESDF代价函数引导去噪过程，该代价函数将来自深度数据的物理障碍和来自高度不可靠区域的虚拟障碍一视同仁。在仿真和真实四旋翼飞行器上，我们的规划器相比最先进的扩散基线产生了显著更安全的轨迹，将障碍违规率从40.3%降至9.6%，并将穿越区域的平均可靠性从0.588提升至0.925。仅消融可靠性项就会使平均可靠性从0.898降至0.783，证实了其关键作用，而蒸馏使框架运行速度比完整视觉语言模型快2倍。