Gaussian processes provide a flexible framework for spatial prediction, but their computational cost limits applicability to large-scale data with large sample size $n$. Predictive processes (PPs), a popular low-rank approximation, mitigate this burden by projecting the original process onto a reduced set of $m\ll n$ inducing points. However, existing theory requires $m$ to grow with $n$, creating a trade-off between accuracy and computational efficiency. We address this challenge by introducing an ensemble of PPs based on spatial partitioning, and propose a novel partitioning and patching scheme with desirable properties. By generalizing the convergence results of PPs, it becomes possible to explicitly balance scalability and accuracy: increasing the number of ensemble components slows down the convergence but substantially improves computational efficiency. We further show theoretically that, despite the limited approximation accuracy of PPs with fixed $m$, they are asymptotically robust to data contamination. Motivated by this insight, we finally introduce a multi-resolution ensemble that combines PPs with fixed $m$ with multiple ensembles defined over possibly overlapping coarse to fine partitions. Simulations and large-scale geostatistical applications demonstrate that our approach delivers accurate, robust predictions with computational gains, providing a practical and broadly applicable solution for spatial prediction.
翻译:高斯过程为空间预测提供了灵活框架,但其计算成本限制了在大样本量$n$数据中的适用性。预测过程(PP)作为一种流行的低秩近似方法,通过将原始过程投影到$m\ll n$个诱导点集合来缓解这一负担。然而现有理论要求$m$随$n$增长,导致准确性与计算效率间的权衡。我们通过引入基于空间划分的预测过程集成来解决这一挑战,并提出具有理想性质的新型分割与修补方案。通过推广预测过程的收敛性结论,显式平衡可扩展性与准确性成为可能:增加集成组件数量虽会减缓收敛速度,但能显著提升计算效率。我们进一步从理论上证明,尽管固定$m$的预测过程近似精度有限,但其对数据污染具有渐近稳健性。受此洞察启发,我们最终引入多分辨率集成框架,将固定$m$的预测过程与定义在潜在重叠的粗到细分区上的多个集成相结合。仿真实验与大规模地统计学应用表明,我们的方法在保证计算增益的同时能提供准确、稳健的预测,为空间预测提供了实用且广泛适用的解决方案。