Due to their flexibility and theoretical tractability Gaussian process (GP) regression models have become a central topic in modern statistics and machine learning. While the true posterior in these models is given explicitly, numerical evaluations depend on the inversion of the augmented kernel matrix $ K + \sigma^2 I $, which requires up to $ O(n^3) $ operations. For large sample sizes n, which are typically given in modern applications, this is computationally infeasible and necessitates the use of an approximate version of the posterior. Although such methods are widely used in practice, they typically have very limtied theoretical underpinning. In this context, we analyze a class of recently proposed approximation algorithms from the field of Probabilistic numerics. They can be interpreted in terms of Lanczos approximate eigenvectors of the kernel matrix or a conjugate gradient approximation of the posterior mean, which are particularly advantageous in truly large scale applications, as they are fundamentally only based on matrix vector multiplications amenable to the GPU acceleration of modern software frameworks. We combine result from the numerical analysis literature with state of the art concentration results for spectra of kernel matrices to obtain minimax contraction rates. Our theoretical findings are illustrated by numerical experiments.
翻译:高斯过程(GP)回归模型因其灵活性和理论可处理性,已成为现代统计学与机器学习的核心课题。尽管这类模型中的真实后验分布具有显式表达式,但其数值计算依赖于增广核矩阵$K + \sigma^2 I$的求逆运算,该操作需要高达$O(n^3)$的计算量。对于现代应用中常见的大样本量n,这种计算在实际上不可行,因此必须采用后验分布的近似版本。尽管此类方法在实践中被广泛使用,但其理论基础通常非常有限。在此背景下,我们分析了一类近期从概率数值计算领域提出的近似算法。这些算法可解释为核矩阵的Lanczos近似特征向量方法,或后验均值的共轭梯度近似方法。由于这些方法本质上仅基于矩阵向量乘法,能够充分利用现代软件框架的GPU加速特性,因此在大规模实际应用中具有显著优势。我们将数值分析文献中的结论与核矩阵谱的最新集中性结果相结合,推导出极小极大收缩率。数值实验进一步验证了我们的理论发现。