Bayesian Kernelized Tensor Factorization as Surrogate for Bayesian Optimization

Bayesian optimization (BO) primarily uses Gaussian processes (GP) as the key surrogate model, mostly with a simple stationary and separable kernel function such as the squared-exponential kernel with automatic relevance determination (SE-ARD). However, such simple kernel specifications are deficient in learning functions with complex features, such as being nonstationary, nonseparable, and multimodal. Approximating such functions using a local GP, even in a low-dimensional space, requires a large number of samples, not to mention in a high-dimensional setting. In this paper, we propose to use Bayesian Kernelized Tensor Factorization (BKTF) -- as a new surrogate model -- for BO in a $D$-dimensional Cartesian product space. Our key idea is to approximate the underlying $D$-dimensional solid with a fully Bayesian low-rank tensor CP decomposition, in which we place GP priors on the latent basis functions for each dimension to encode local consistency and smoothness. With this formulation, information from each sample can be shared not only with neighbors but also across dimensions. Although BKTF no longer has an analytical posterior, we can still efficiently approximate the posterior distribution through Markov chain Monte Carlo (MCMC) and obtain prediction and full uncertainty quantification (UQ). We conduct numerical experiments on both standard BO test functions and machine learning hyperparameter tuning problems, and our results show that BKTF offers a flexible and highly effective approach for characterizing complex functions with UQ, especially in cases where the initial sample size and budget are severely limited.

翻译：贝叶斯优化（BO）主要采用高斯过程（GP）作为核心代理模型，通常使用简单平稳且可分离的核函数，例如带自动相关性确定（SE-ARD）的平方指数核。然而，这种简单的核函数在拟合具有非平稳、非可分离和多模态等复杂特征的函数时存在不足。即便在低维空间中，使用局部高斯过程逼近此类函数也需要大量样本，更不用说在高维场景下。本文提出将贝叶斯核张量分解（BKTF）作为新型代理模型，用于$D$维笛卡尔积空间中的贝叶斯优化。其核心思想是利用全贝叶斯低秩张量CP分解逼近$D$维空间，其中对每个维度的潜在基函数引入高斯过程先验，以编码局部一致性与平滑性。通过这种建模方式，各样本的信息不仅能与相邻样本共享，还能跨越维度进行传递。尽管BKTF不再具有解析后验分布，我们仍可通过马尔可夫链蒙特卡洛（MCMC）高效近似后验分布，同时实现预测与完整的不确定性量化（UQ）。在标准贝叶斯优化测试函数及机器学习超参数调优问题上的数值实验表明，BKTF为表征需不确定性量化的复杂函数提供了灵活且高效的方法，尤其适用于初始样本量与预算严重受限的场景。