Gaussian Processes (GPs) are widely used to model dependencies in spatial statistics and machine learning. However, exact inference is computationally intractable for GP regression, with a time complexity of $O(n^3)$. The Vecchia approximation scales up computation by introducing sparsity into the spatial dependency structure, represented by a directed acyclic graph (DAG). Despite its practical popularity, this approach lacks rigorous theoretical foundations, and the choice of DAG structure remains an open problem. In this paper, we systematically study the Vecchia approximation of the popular, isotropic Matérn GP as standalone stochastic process and uncover key probabilistic and statistical properties. We propose selecting parent sets as norming sets with fixed cardinality in the Vecchia approximation. On the probabilistic side, we show that the conditional distributions of Matérn GPs, as well as their Vecchia approximations, can be characterized by polynomial interpolations. This enables us to establish several results on small ball probabilities and the Reproducing Kernel Hilbert Spaces (RKHSs) of Vecchia GPs. Building on these probabilistic results, we prove that in the nonparametric regression model, the corresponding posterior contracts around the truth at the optimal minimax rate, both under oracle rescaling and hierarchical tuning of the prior. We illustrate the theoretical findings through numerical experiments on synthetic datasets. Our core algorithms are implemented in C++ with an R interface.
翻译:高斯过程(GPs)广泛应用于空间统计学与机器学习中的依赖关系建模。然而,GP回归的精确推断在计算上难以处理,其时间复杂度为$O(n^3)$。Vecchia近似通过在有向无环图(DAG)表示的依赖结构中引入稀疏性,实现了计算规模的扩展。尽管该方法在实践中广受欢迎,但其缺乏严格的理论基础,且DAG结构的选择仍是一个开放性问题。本文系统研究了各向同性Matérn GP作为独立随机过程的Vecchia近似,并揭示了其关键的概率与统计性质。我们提出在Vecchia近似中选择固定基数的规范集作为父节点集。在概率性质方面,我们证明了Matérn GP及其Vecchia近似的条件分布均可通过多项式插值进行刻画。基于此,我们建立了关于Vecchia GP的小球概率与再生核希尔伯特空间(RKHS)的若干结论。依托这些概率结果,我们证明了在非参数回归模型中,无论采用先验的oracle重缩放还是分层调参,相应后验分布均以最优极小极大速率收缩至真实值。我们通过合成数据集的数值实验验证了理论发现。核心算法采用C++实现并配有R语言接口。