Gaussian Process (GP) models provide a flexible framework for prediction and uncertainty quantification. For most covariance functions, however, exact GP prediction with $n$ points scales as $\mathcal{O}(n^3)$, making it prohibitively expensive for large datasets or large numbers of prediction points. While nearest neighbor-based prediction can work well in certain settings, non-pathological circumstances (for example measurement noise) can severely restrict its efficiency. This work presents a complementary approach where one conditions on carefully designed linear combinations of data, which is particularly effective in the setting of predicting many values in large connected regions of the data domain. For kernel functions that are smooth away from the origin, conditioning on a small number $r$ of such data contrasts can be machine-precision accurate for the full exact conditional distributions. These contrasts cost $\mathcal{O}(T r^2)$ work to compute where $T$ is the cost of solving a linear system with the data covariance matrix, and so in many cases can be computed in linear or near-linear cost by exploiting rank structure in well-behaved covariance matrices. At the cost of $\mathcal{O}(nr^2)$ additional precomputation work, this approach can also provide predictions at arbitrary points of a designated region in $\mathcal{O}(1)$ online work, making it particularly attractive for problems where prediction points are not known in advance.
翻译:高斯过程模型为预测和不确定性量化提供了灵活框架。然而对于大多数协方差函数而言,基于n个数据点的精确高斯过程预测计算复杂度为$\mathcal{O}(n^3)$,这使得其在大规模数据集或大量预测点场景中代价过高。尽管近邻预测方法在某些情况下表现良好,但非病态条件(例如测量噪声)会严重限制其效率。本文提出一种互补方法,通过对精心设计的数据线性组合进行条件化操作,特别适用于在数据域内大型连通区域中预测大量数值的情况。对于原点处光滑的核函数,仅需使用r个此类数据对比量进行条件化,即可达到与完整精确条件分布相当的机器精度。计算这些对比量的复杂度为$\mathcal{O}(T r^2)$,其中T表示求解数据协方差矩阵线性系统的代价。通过利用良态协方差矩阵的秩结构,该方法在多数情况下可实现线性或近线性计算复杂度。通过额外$\mathcal{O}(nr^2)$的预计算,该方法还能在指定区域以$\mathcal{O}(1)$在线计算复杂度完成任意点的预测,特别适用于预测点未预先确定的场景。