The computational cost for inference and prediction of statistical models based on Gaussian processes with Mat\'ern covariance functions scales cubicly with the number of observations, limiting their applicability to large data sets. The cost can be reduced in certain special cases, but there are currently no generally applicable exact methods with linear cost. Several approximate methods have been introduced to reduce the cost, but most of these lack theoretical guarantees for the accuracy. We consider Gaussian processes on bounded intervals with Mat\'ern covariance functions and for the first time develop a generally applicable method with linear cost and with a covariance error that decreases exponentially fast in the order $m$ of the proposed approximation. The method is based on an optimal rational approximation of the spectral density and results in an approximation that can be represented as a sum of $m$ independent Gaussian Markov processes, which facilitates easy usage in general software for statistical inference, enabling its efficient implementation in general statistical inference software packages. Besides the theoretical justifications, we demonstrate the accuracy empirically through carefully designed simulation studies which show that the method outperforms all state-of-the-art alternatives in terms of accuracy for a fixed computational cost in statistical tasks such as Gaussian process regression.
翻译:基于Matérn协方差函数的高斯过程统计模型,其推断与预测的计算成本随观测数量呈三次方增长,这限制了其在大规模数据集上的应用。在某些特殊情况下计算成本可被降低,但目前尚不存在具有线性成本且普遍适用的精确方法。已有多种近似方法被提出以降低计算成本,但其中大多数缺乏关于精度的理论保证。我们考虑定义在有界区间上且具有Matérn协方差函数的高斯过程,首次提出了一种具有线性成本且协方差误差随近似阶数$m$呈指数级快速下降的通用方法。该方法基于谱密度的最优有理逼近,产生的近似可表示为$m$个独立高斯马尔可夫过程之和,这便于在通用统计推断软件中直接使用,从而能在通用统计推断软件包中高效实现。除理论论证外,我们通过精心设计的模拟研究实证验证了其精度,结果表明在固定计算成本下,该方法在高斯过程回归等统计任务中的精度优于所有现有先进方案。