Thompson sampling (TS) is a simple, effective stochastic policy in Bayesian decision making. It samples the posterior belief about the reward profile and optimizes the sample to obtain a candidate decision. In continuous optimization, the posterior of the objective function is often a Gaussian process (GP), whose sample paths have numerous local optima, making their global optimization challenging. In this work, we introduce an efficient global optimization strategy for GP-TS that carefully selects starting points for gradient-based multi-start optimizers. It identifies all local optima of the prior sample via univariate global rootfinding, and optimizes the posterior sample using a differentiable, decoupled representation. We demonstrate remarkable improvement in the global optimization of GP posterior samples, especially in high dimensions. This leads to dramatic improvements in the overall performance of Bayesian optimization using GP-TS acquisition functions, surprisingly outperforming alternatives like GP-UCB and EI.
翻译:汤普森采样(TS)是贝叶斯决策中一种简单而有效的随机策略。它通过对奖励分布的后验信念进行采样,并优化该采样以获得候选决策。在连续优化问题中,目标函数的后验通常为高斯过程(GP),其采样路径存在大量局部最优解,使得全局优化具有挑战性。本文提出了一种高效的GP-TS全局优化策略,该策略通过精心选择基于梯度的多起点优化器的初始点来实现。该方法通过一元全局求根法识别先验采样的所有局部最优解,并利用可微分的解耦表示对后验采样进行优化。实验表明,该方法显著提升了高斯过程后验采样的全局优化效果,尤其在处理高维问题时表现突出。这进而使得采用GP-TS采集函数的贝叶斯优化整体性能获得显著提升,其表现甚至意外地超越了GP-UCB和EI等替代方法。