Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. Therefore in practice, GP models are often as much about the approximation method as they are about the data. Here, we develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended. The most common GP approximations map to an instance in this class, such as methods based on the Cholesky factorization, conjugate gradients, and inducing points. For any method in this class, we prove (i) convergence of its posterior mean in the associated RKHS, (ii) decomposability of its combined posterior covariance into mathematical and computational covariances, and (iii) that the combined variance is a tight worst-case bound for the squared error between the method's posterior mean and the latent function. Finally, we empirically demonstrate the consequences of ignoring computational uncertainty and show how implicitly modeling it improves generalization performance on benchmark datasets.
翻译:高斯过程的计算复杂度随数据集规模急剧增长。为此,研究者开发了大量近似方法,这些方法不可避免地引入近似误差。由于有限计算导致的这一额外不确定性来源,在使用近似后验时完全被忽略。因此在实际应用中,高斯过程模型往往同时体现近似方法和数据本身的影响。本文发展了一类新方法,能够对由有限观测数据和有限计算量共同导致的组合不确定性提供一致估计。常见的GP近似方法均可映射到该框架中的具体实例,包括基于Cholesky分解、共轭梯度和诱导点的方法。对于该框架中的任意方法,我们证明了:(i) 其相关再生核希尔伯特空间中的后验均值收敛性;(ii) 组合后验协方差可分解为数学协方差与计算协方差;(iii) 组合方差是方法后验均值与隐函数之间平方误差的紧致最坏情况界。最后,我们通过实验证明了忽略计算不确定性的后果,并展示了隐式建模计算不确定性如何提升基准数据集的泛化性能。