Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. Therefore in practice, GP models are often as much about the approximation method as they are about the data. Here, we develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended. The most common GP approximations map to an instance in this class, such as methods based on the Cholesky factorization, conjugate gradients, and inducing points. For any method in this class, we prove (i) convergence of its posterior mean in the associated RKHS, (ii) decomposability of its combined posterior covariance into mathematical and computational covariances, and (iii) that the combined variance is a tight worst-case bound for the squared error between the method's posterior mean and the latent function. Finally, we empirically demonstrate the consequences of ignoring computational uncertainty and show how implicitly modeling it improves generalization performance on benchmark datasets.
翻译:高斯过程的计算复杂度随数据集规模呈指数增长。为此,研究者提出了多种近似方法,但这些方法不可避免地引入近似误差。当采用近似后验时,这一由有限计算资源导致的额外不确定性来源被完全忽略。因此在实际应用中,高斯过程模型往往不仅依赖于数据,更取决于所采用的近似方法。本文提出了一类新方法,可对由有限观测数据和有限计算资源共同产生的组合不确定性进行一致估计。常见的高斯过程近似方法(如基于Cholesky分解、共轭梯度和诱导点的方法)均属于此类方法的一个实例。针对该类方法,我们证明:(i)其后验均值在相关再生核希尔伯特空间中收敛;(ii)组合后验协方差可分解为数学协方差与计算协方差;(iii)组合方差是该方法后验均值与潜在函数之间平方误差的最紧最坏情况上界。最后,我们通过实验证明了忽略计算不确定性的后果,并展示了如何通过隐式建模计算不确定性来提升基准数据集的泛化性能。