Gaussian processes (GPs) are flexible, probabilistic, nonparametric models widely used in fields such as spatial statistics and machine learning. A drawback of Gaussian processes is their computational cost, with $O(N^3)$ time and $O(N^2)$ memory complexity, which makes them prohibitive for large data sets. Numerous approximation techniques have been proposed to address this limitation. In this work, we systematically compare the accuracy of different Gaussian process approximations with respect to likelihood evaluation, parameter estimation, and prediction, explicitly accounting for the computational time required. We analyze the trade-off between accuracy and runtime on multiple simulated and large-scale real-world data sets and find that Vecchia approximations consistently provide the best accuracy-runtime trade-off across most settings considered.
翻译:高斯过程(GP)是灵活、概率性、非参数模型,广泛应用于空间统计和机器学习等领域。其局限性在于计算成本高昂,时间复杂度为$O(N^3)$,空间复杂度为$O(N^2)$,导致难以处理大规模数据集。为克服这一限制,研究者提出了多种近似技术。本研究系统比较了不同高斯过程近似在似然评估、参数估计和预测方面的精度,并明确考虑了所需的计算时间。我们基于多个模拟数据集和大规模真实数据集分析了精度与运行时间之间的权衡,结果表明,在大多数研究设定中,Vecchia近似持续提供了最佳的精度-运行时间权衡。