Argo is an international program that collects temperature and salinity observations in the upper two kilometers of the global ocean. Most existing approaches for modeling Argo temperature rely on spatial partitioning, where data are locally modeled by first estimating a prescribed mean structure and then fitting Gaussian processes (GPs) to the mean-subtracted anomalies. Such strategies introduce challenges in designing suitable mean structures and defining domain partitions, often resulting in ad hoc modeling choices. In this work, we propose a one-stop Gaussian process regression framework with a generic spatio-temporal covariance function to jointly model Argo temperature data across broad spatial domains. Our fully data-driven approach achieves superior predictive performance compared with methods that require domain partitioning or parametric regression. To ensure scalability over large spatial regions, we employ the Vecchia approximation, which reduces the computational complexity from cubic to quasi-linear in the number of observations while preserving predictive accuracy. Using Argo data from January to March over the years 2007-2016, the same dataset used in prior benchmark studies, we demonstrate that our approach provides a principled, scalable, and interpretable tool for large-scale oceanographic analysis.
翻译:Argo是一项国际观测计划,旨在收集全球海洋上层两公里范围内的温度和盐度数据。现有的大多数Argo温度建模方法依赖于空间分区策略:首先通过估计预设均值结构对数据进行局部建模,再对均值扣除后的异常值拟合高斯过程。此类策略在均值结构设计与区域划分方面面临挑战,常导致建模选择缺乏理论依据。本研究提出一种一体化高斯过程回归框架,采用通用的时空协方差函数对跨广阔空间域的Argo温度数据进行联合建模。与需要区域划分或参数回归的方法相比,这种完全数据驱动的方法展现出更优的预测性能。为实现大空间区域的可扩展计算,我们采用Vecchia近似方法,在保持预测精度的同时将计算复杂度从观测数量的三次方降低至拟线性级别。基于2007-2016年间每年1-3月的Argo数据(与先前基准研究使用相同数据集),我们证明该方法为大规模海洋学分析提供了原理清晰、可扩展且可解释的研究工具。