In geostatistical problems with massive sample size, Gaussian processes can be approximated using sparse directed acyclic graphs to achieve scalable $O(n)$ computational complexity. In these models, data at each location are typically assumed conditionally dependent on a small set of parents which usually include a subset of the nearest neighbors. These methodologies often exhibit excellent empirical performance, but the lack of theoretical validation leads to unclear guidance in specifying the underlying graphical model and sensitivity to graph choice. We address these issues by introducing radial neighbors Gaussian processes (RadGP), a class of Gaussian processes based on directed acyclic graphs in which directed edges connect every location to all of its neighbors within a predetermined radius. We prove that any radial neighbors Gaussian process can accurately approximate the corresponding unrestricted Gaussian process in Wasserstein-2 distance, with an error rate determined by the approximation radius, the spatial covariance function, and the spatial dispersion of samples. We offer further empirical validation of our approach via applications on simulated and real world data showing excellent performance in both prior and posterior approximations to the original Gaussian process.
翻译:在处理海量样本的地统计问题时,高斯过程可通过稀疏有向无环图进行逼近,以实现可扩展的$O(n)$计算复杂度。在此类模型中,通常假设每个位置的数据条件依赖于一个较小的父节点集合,该集合通常包含最近邻点的一个子集。这些方法在实证中常表现出优异性能,但由于缺乏理论验证,导致在设定底层图模型时缺乏明确指导,且对图结构的选择较为敏感。为解决这些问题,我们提出径向邻居高斯过程(RadGP),这是一类基于有向无环图的高斯过程,其中有向边将每个位置与其预定半径内的所有邻居相连。我们证明,任何径向邻居高斯过程都能以Wasserstein-2距离精确逼近对应的无约束高斯过程,其误差率由逼近半径、空间协方差函数及样本的空间离散度共同决定。我们通过对模拟数据和实际数据的应用,进一步提供了该方法的实证验证,结果表明其在原始高斯过程的先验与后验逼近中均表现出卓越性能。