We introduce FLUXtrapolation, a benchmark for extrapolating ecosystem fluxes under progressively harder distribution shifts. Ecosystem fluxes are central to understanding the carbon, water, and energy cycles, yet they can only be measured directly at sparsely located measurement towers. Producing global flux estimates therefore requires training models on observed sites using globally available covariates and predicting in unobserved regions, that is, upscaling. Flux upscaling is a challenging domain generalization problem that is affected by a shift in covariate distribution across climates, ecosystem types, and environmental conditions, as well as by conditional shift: important drivers remain unobserved at global scale. We provide a quantitative analysis of both these shifts in $P_X$ and $P_{Y\mid X}$. FLUXtrapolation is designed based on domain expertise on flux upscaling: it defines temporal, spatial, and temperature-based extrapolation scenarios and evaluates performance across held-out domains, temporal aggregations, and tail errors. In a pilot study, we find that baselines perform similarly under median hourly RMSE, but separate under the proposed tail-focused and multi-scale evaluation. FLUXtrapolation therefore poses a realistic and thus relevant challenge for machine learning methods under distribution shift; at the same time, progress on this benchmark would directly support the scientific goal of improving flux upscaling.
翻译:我们提出了FLUXtrapolation,一个用于在逐渐增强的分布偏移下外推生态系统通量的基准测试。生态系统通量对于理解碳、水及能量循环至关重要,但仅能通过稀疏分布的观测塔直接测量。因此,生成全球通量估算需要利用全球可用的协变量在观测站点上训练模型,并预测未观测区域的结果,即尺度上推。通量尺度上推是一个具有挑战性的领域泛化问题,既受跨气候、生态系统类型及环境条件的协变量分布偏移影响,也受条件偏移影响:关键驱动因子在全球尺度上仍未被观测到。我们对$P_X$和$P_{Y\mid X}$中的这两种偏移进行了定量分析。FLUXtrapolation基于通量尺度上推的领域专业知识设计:它定义了时间、空间和基于温度的外推场景,并在留出域、时间聚合及尾部误差上评估性能。在初步研究中,我们发现基线方法在中位数小时均方根误差上表现相似,但在所提出的尾部聚焦和多尺度评估中表现分化。因此,FLUXtrapolation为分布偏移下的机器学习方法提出了一个现实且相关的挑战;同时,该基准的进展将直接支持改进通量尺度上推的科学目标。