Increasingly large and complex spatial datasets pose massive inferential challenges due to high computational and storage costs. Our study is motivated by the KAUST Competition on Large Spatial Datasets 2023, which tasked participants with estimating spatial covariance-related parameters and predicting values at testing sites, along with uncertainty estimates. We compared various statistical and deep learning approaches through cross-validation and ultimately selected the Vecchia approximation technique for model fitting. To overcome the constraints in the R package GpGp, which lacked support for fitting zero-mean Gaussian processes and direct uncertainty estimation-two things that are necessary for the competition, we developed additional \texttt{R} functions. Besides, we implemented certain subsampling-based approximations and parametric smoothing for skewed sampling distributions of the estimators. Our team DesiBoys secured victory in two out of four sub-competitions, validating the effectiveness of our proposed strategies. Moreover, we extended our evaluation to a large real spatial satellite-derived dataset on total precipitable water, where we compared the predictive performances of different models using multiple diagnostics.
翻译:日益庞大复杂的空间数据集由于高昂的计算和存储成本,给推断带来了巨大挑战。本研究受2023年KAUST大规模空间数据集竞赛启发,该竞赛要求参赛者估计空间协方差相关参数,并预测测试地点的数值及不确定性估计。我们通过交叉验证比较了多种统计与深度学习方法,最终选取Vecchia近似技术进行模型拟合。为克服R包GpGp中缺乏对零均值高斯过程拟合及直接不确定性估计支持的限制(这两项为竞赛必要条件),我们开发了额外的R函数。此外,针对估计量偏态抽样分布,我们实施了基于子采样的近似方法与参数平滑处理。我们的团队DesiBoys在四个子竞赛中赢得两个冠军,验证了所提策略的有效性。进一步地,我们将评估扩展至基于卫星获取的可降水总量大型真实空间数据集,采用多种诊断指标比较不同模型的预测性能。