Bayesian optimization (BO) primarily uses Gaussian processes (GP) as the key surrogate model, mostly with a simple stationary and separable kernel function such as the widely used squared-exponential kernel with automatic relevance determination (SE-ARD). However, such simple kernel specifications are deficient in learning functions with complex features, such as being nonstationary, nonseparable, and multimodal. Approximating such functions using a local GP, even in a low-dimensional space, will require a large number of samples, not to mention in a high-dimensional setting. In this paper, we propose to use Bayesian Kernelized Tensor Factorization (BKTF) -- as a new surrogate model -- for BO in a D-dimensional Cartesian product space. Our key idea is to approximate the underlying D-dimensional solid with a fully Bayesian low-rank tensor CP decomposition, in which we place GP priors on the latent basis functions for each dimension to encode local consistency and smoothness. With this formulation, information from each sample can be shared not only with neighbors but also across dimensions. Although BKTF no longer has an analytical posterior, we can still efficiently approximate the posterior distribution through Markov chain Monte Carlo (MCMC) and obtain prediction and full uncertainty quantification (UQ). We conduct numerical experiments on both standard BO testing problems and machine learning hyperparameter tuning problems, and our results confirm the superiority of BKTF in terms of sample efficiency.
翻译:贝叶斯优化(BO)主要采用高斯过程(GP)作为核心代理模型,通常使用简单的平稳可分离核函数,例如广泛采用的带有自动相关性确定(SE-ARD)的平方指数核。然而,这类简单核函数在逼近具有复杂特征(如非平稳性、非可分离性和多模态性)的目标函数时存在不足。即使在低维空间中,使用局部GP近似这类函数也需要大量样本,更不用说在高维场景下。本文提出将贝叶斯核化张量分解(BKTF)作为新的代理模型,用于D维笛卡尔乘积空间中的贝叶斯优化。核心思想是采用全贝叶斯低秩张量CP分解来逼近底层D维实体,其中对每个维度的潜在基函数施加GP先验以编码局部一致性与平滑性。通过该公式化,样本信息不仅可以在相邻区域间共享,还能跨维度传播。尽管BKTF不再具有解析后验,我们仍可通过马尔可夫链蒙特卡洛(MCMC)高效近似后验分布,并获得预测结果与完整的不确定性量化(UQ)。我们在标准BO测试问题及机器学习超参数调优问题上进行数值实验,结果证实了BKTF在样本效率方面的优越性。