FireScope：基于思维链预言机的野火风险预测 (FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle)

Predicting wildfire risk is a reasoning-intensive spatial problem that requires the integration of visual, climatic, and geographic factors to infer continuous risk maps. Existing methods lack the causal reasoning and multimodal understanding required for reliable generalization. We introduce $\textbf{FireScope-Bench}$, a large-scale dataset and benchmark that couples Sentinel-2 imagery and climate data with expert-defined risk rasters across the USA, and real wildfire events in Europe for cross-continental evaluation. Building on this dataset, we propose $\textbf{FireScope}$, a VLM-based reasoning-to-generation framework that learns from both reinforcement learning and visual supervision to predict risk rasters with complementary reasoning traces. When trained in the USA and tested in Europe, $\textbf{FireScope}$ achieves substantial performance gains, while expert feedback and automated analysis confirm that its reasoning traces are faithful and semantically meaningful. Our findings demonstrate that reasoning can ground raster prediction models, improving both generalization and interpretability. To our knowledge, this is the first framework to (1) demonstrate that language-based reasoning can improve generalization in visual generation, (2) propose a high-resolution wildfire risk model that can be applied across continents, and (3) enable systematic studies of robust cross-continental generalization for multimodal fire risk models. We believe that $\textbf{FireScope-Bench}$ has the potential to serve as a foundation for advancing reasoning-driven, interpretable and generalizable spatial modeling. Data and source code will be made publicly available.

翻译：野火风险预测是一个需要密集推理的空间问题，它要求整合视觉、气候和地理因素以推断连续的风险图。现有方法缺乏可靠泛化所需的因果推理和多模态理解能力。我们引入了 $\textbf{FireScope-Bench}$，这是一个大规模数据集和基准，它耦合了美国地区的 Sentinel-2 影像和气候数据与专家定义的风险栅格，以及欧洲的真实野火事件用于跨大陆评估。基于此数据集，我们提出了 $\textbf{FireScope}$，一个基于视觉语言模型（VLM）的推理到生成框架，它通过强化学习和视觉监督进行学习，以预测风险栅格并生成互补的推理轨迹。当在美国训练并在欧洲测试时，$\textbf{FireScope}$ 实现了显著的性能提升，同时专家反馈和自动化分析证实其推理轨迹是忠实且具有语义意义的。我们的研究结果表明，推理可以夯实栅格预测模型，同时提升泛化能力和可解释性。据我们所知，这是首个框架能够：（1）证明基于语言的推理可以提升视觉生成任务的泛化能力；（2）提出一个可跨大陆应用的高分辨率野火风险模型；（3）为多模态火灾风险模型实现稳健的跨大陆泛化提供系统性研究基础。我们相信 $\textbf{FireScope-Bench}$ 有潜力成为推动推理驱动、可解释且可泛化的空间建模的基础。数据和源代码将公开提供。