FIRE-VLM：面向无人机野火追踪的视觉语言驱动强化学习框架——基于物理基础的火灾数字孪生环境 (FIRE-VLM: A Vision-Language-Driven Reinforcement Learning Framework for UAV Wildfire Tracking in a Physics-Grounded Fire Digital Twin)

Wildfire monitoring demands autonomous systems capable of reasoning under extreme visual degradation, rapidly evolving physical dynamics, and scarce real-world training data. Existing UAV navigation approaches rely on simplified simulators and supervised perception pipelines, and lack embodied agents interacting with physically realistic fire environments. We introduce FIRE-VLM, the first end-to-end vision-language model (VLM) guided reinforcement learning (RL) framework trained entirely within a high-fidelity, physics-grounded wildfire digital twin. Built from USGS Digital Elevation Model (DEM) terrain, LANDFIRE fuel inventories, and semi-physical fire-spread solvers, this twin captures terrain-induced runs, wind-driven acceleration, smoke plume occlusion, and dynamic fuel consumption. Within this environment, a PPO agent with dual-view UAV sensing is guided by a CLIP-style VLM. Wildfire-specific semantic alignment scores, derived from a single prompt describing active fire and smoke plumes, are integrated as potential-based reward shaping signals. Our contributions are: (1) a GIS-to-simulation pipeline for constructing wildfire digital twins; (2) a VLM-guided RL agent for UAV firefront tracking; and (3) a wildfire-aware reward design that combines physical terms with VLM semantics. Across five digital-twin evaluation tasks, our VLM-guided policy reduces time-to-detection by up to 6 times, increases time-in-FOV, and is, to our knowledge, the first RL-based UAV wildfire monitoring system demonstrated in kilometer-scale, physics-grounded digital-twin fires.

翻译：野火监测需要能够在极端视觉退化、快速演变的物理动态以及真实世界训练数据稀缺条件下进行自主推理的系统。现有无人机导航方法依赖于简化的模拟器和监督式感知流程，且缺乏与物理真实火灾环境交互的具身智能体。我们提出了FIRE-VLM，这是首个完全在高保真、基于物理的野火数字孪生环境中训练的端到端视觉语言模型（VLM）引导的强化学习（RL）框架。该数字孪生基于美国地质调查局数字高程模型（DEM）地形、LANDFIRE燃料清单和半物理火灾蔓延求解器构建，能够捕捉地形诱导的火势蔓延、风驱加速、烟雾羽流遮挡以及动态燃料消耗。在此环境中，一个具备双视角无人机感知能力的PPO智能体由CLIP风格的VLM进行引导。通过单一提示词描述活跃火势与烟雾羽流所衍生的野火专用语义对齐分数，被整合为基于势函数的奖励塑形信号。我们的贡献包括：（1）用于构建野火数字孪生的地理信息系统至仿真流水线；（2）用于无人机火线追踪的VLM引导RL智能体；（3）结合物理项与VLM语义的野火感知奖励设计。在五项数字孪生评估任务中，我们的VLM引导策略将火情探测时间缩短至多6倍，增加了视野内停留时间，并且据我们所知，这是首个在公里级、基于物理的数字孪生火灾中得到验证的基于RL的无人机野火监测系统。