Estimating spatial extremes from sparse observational networks produces uncertain return level maps, but dense output from physics-based simulation models is often available as a complementary data source. We develop a two-stage frequentist frame-work for fusing observations and simulations. In Stage 1, generalized extreme value (GEV) distributions are fitted independently at each site, with a nonstationary location parameter where appropriate to accommodate observed trends. In Stage 2, the parameter estimates from all sources are modeled jointly as a high-dimensional spatial process through a linear model of coregionalization (LMC). Cross-source correlations, estimated from spatially interspersed networks without co-located sites, provide the mechanism for information transfer; an analytic gradient for the resulting likelihood keeps estimation computationally practical. We apply the framework to U.S. coastal sea levels over 1979-2021, fusing 29 NOAA tide gauge records with 100 ADCIRC hydrodynamic simulation sites. Leave-one-out cross-validation shows a 35% reduction in 100-year return level RMSE relative to a gauge-only model. Geographic block cross-validation confirms that fusion benefits persist under spatial extrapolation. The approach is implemented in the R package evfuse.
翻译:从稀疏观测网络估计空间极值会产生不确定的重现期水平图,但基于物理的模拟模型通常能提供密集输出作为补充数据源。我们开发了一个用于融合观测与模拟的两阶段频率论框架。在第一阶段,各站点独立拟合广义极值(GEV)分布,并在适当时采用非平稳位置参数以容纳观测到的趋势。在第二阶段,通过线性协同区域化模型(LMC),将所有来源的参数估计值联合建模为一个高维空间过程。基于空间交错网络(无共址站点)估计的跨源相关性,为信息传递提供了机制;所得似然函数的解析梯度保证了估计在计算上的可行性。我们将该框架应用于1979-2021年美国沿海海平面数据,融合了29个NOAA潮汐观测站记录与100个ADCIRC水动力模拟站点。留一交叉验证显示,相较于仅使用观测站的模型,100年重现期水平的均方根误差降低了35%。地理区块交叉验证证实,融合优势在空间外推条件下依然存在。该方法已在R包evfuse中实现。