A Nationwide Benchmark for Wildfire Initial Attack Failure Prediction with Public Environmental Data

Initial attack (IA) is the first wildfire suppression phase, when agencies must quickly decide which fires may escape early control. Existing IA failure prediction studies often use non-public response records or regional settings, so it remains unclear how well public data available at fire discovery time can support IA failure prediction at national scale. We present WILDFIREIA, the first U.S. national-scale benchmark for IA failure prediction from environmental and contextual data available at fire discovery time. WILDFIREIA aligns 38,128 naturally caused FPA-FOD wildfire events with FIRMS/VIIRS thermal detections, gridMET weather and fire-danger variables, LANDFIRE vegetation, fuel, and topography, OpenStreetMap access features, and WorldPop population density. To prevent data leakage, the benchmark fixes the event unit, size-based label rule, chronological split, metrics, and forbidden-feature list, and excludes final fire size, containment timestamps, and post-discovery satellite detections from model inputs. We evaluate 16 representative models across tabular, temporal, spatial, and spatiotemporal families under the same protocol. Results show that public discovery-time data provides useful but incomplete signal for IA failure prediction: XGBoost achieves the best AUPRC of 53.3%; FIRMS/VIIRS is the least redundant source; and fuel is the strongest static predictor when dynamic observations are unavailable. We release preprocessing outputs and model-ready caches to support reproducible research on early wildfire risk assessment: https://github.com/LabRAI/WildfireIA#.

翻译：初始攻击（IA）是野火抑制的首个阶段，各机构须快速判断哪些火灾可能超出早期控制。现有IA失败预测研究常使用非公开的响应记录或区域设定，因此尚不明确火灾发现时可获取的公共数据能否在全国范围内有效支持IA失败预测。我们提出WILDFIREIA，这是首个基于火灾发现时可获取的环境与情境数据、用于IA失败预测的美国全国性基准。WILDFIREIA将38,128起自然引发的FPA-FOD野火事件与FIRMS/VIIRS热探测数据、gridMET气象与火灾危险变量、LANDFIRE植被、燃料与地形数据、OpenStreetMap通达性特征及WorldPop人口密度进行对齐。为防止数据泄露，该基准固定了事件单元、基于大小的标签规则、时间顺序划分、评估指标及禁用特征列表，并在模型输入中排除了最终火灾面积、遏制时间戳及发现后卫星探测数据。我们在相同协议下评估了16个代表性模型，涵盖表格、时间、空间及时空四大类。结果表明，公共发现时刻数据为IA失败预测提供了有用但不完整的信号：XGBoost取得最佳AUPRC为53.3%；FIRMS/VIIRS是冗余度最低的数据源；在动态观测缺失时，燃料是最强的静态预测因子。我们发布预处理输出及模型就绪缓存，以支持早期野火风险评估的可重复研究：https://github.com/LabRAI/WildfireIA#