Representing Additive Gaussian Processes by Sparse Matrices

Among generalized additive models, additive Mat\'ern Gaussian Processes (GPs) are one of the most popular for scalable high-dimensional problems. Thanks to their additive structure and stochastic differential equation representation, back-fitting-based algorithms can reduce the time complexity of computing the posterior mean from $O(n^3)$ to $O(n\log n)$ time where $n$ is the data size. However, generalizing these algorithms to efficiently compute the posterior variance and maximum log-likelihood remains an open problem. In this study, we demonstrate that for Additive Mat\'ern GPs, not only the posterior mean, but also the posterior variance, log-likelihood, and gradient of these three functions can be represented by formulas involving only sparse matrices and sparse vectors. We show how to use these sparse formulas to generalize back-fitting-based algorithms to efficiently compute the posterior mean, posterior variance, log-likelihood, and gradient of these three functions for additive GPs, all in $O(n \log n)$ time. We apply our algorithms to Bayesian optimization and propose efficient algorithms for posterior updates, hyperparameters learning, and computations of the acquisition function and its gradient in Bayesian optimization. Given the posterior, our algorithms significantly reduce the time complexity of computing the acquisition function and its gradient from $O(n^2)$ to $O(\log n)$ for general learning rate, and even to $O(1)$ for small learning rate.

翻译：在广义加性模型中，加法Matérn高斯过程（GPs）是可扩展高维问题中最流行的模型之一。借助其加法结构和随机微分方程表示，基于回填的算法可将后验均值的计算时间复杂度从$O(n^3)$降低至$O(n\log n)$（其中$n$为数据规模）。然而，如何将这些算法推广至高效计算后验方差与最大对数似然仍是一个未解决问题。本研究表明，对于加法Matérn高斯过程，不仅后验均值，后验方差、对数似然以及这三个函数的梯度均可由仅含稀疏矩阵与稀疏向量的公式表示。我们展示了如何利用这些稀疏公式推广基于回填的算法，以$O(n\log n)$时间复杂度高效计算加法高斯过程的后验均值、后验方差、对数似然及其梯度。我们将算法应用于贝叶斯优化，提出了后验更新、超参数学习以及贝叶斯优化中采集函数及其梯度计算的高效方法。在后验已知条件下，我们的算法将采集函数及其梯度的计算时间复杂度从$O(n^2)$降低至$O(\log n)$（通用学习率），当学习率较小时甚至可降至$O(1)$。

相关内容

高斯过程

关注 6

高斯过程（Gaussian Process, GP）是概率论和数理统计中随机过程（stochastic process）的一种，是一系列服从正态分布的随机变量（random variable）在一指数集（index set）内的组合。高斯过程中任意随机变量的线性组合都服从正态分布，每个有限维分布都是联合正态分布，且其本身在连续指数集上的概率密度函数即是所有随机变量的高斯测度，因此被视为联合正态分布的无限维广义延伸。高斯过程由其数学期望和协方差函数完全决定，并继承了正态分布的诸多性质

【罗切斯特Yuqian Zhang等书】从对称到几何:可处理的非凸问题，34页pdf，From Symmetry to Geometry: Tractable Nonconvex Problems

专知会员服务

20+阅读 · 2022年3月4日

【硬核书】矩阵代数基础，248页pdf

专知会员服务

88+阅读 · 2021年12月9日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

434+阅读 · 2021年1月11日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日