Gaussian process (GP) bandits provide a powerful framework for performing blackbox optimization of unknown functions. The characteristics of the unknown function depend heavily on the assumed GP prior. Most work in the literature assume that this prior is known but in practice this seldom holds. Instead, practitioners often rely on maximum likelihood estimation to select the hyperparameters of the prior - which lacks theoretical guarantees. In this work, we propose two algorithms for joint prior selection and regret minimization in GP bandits based on GP Thompson sampling (GP-TS): Prior-Elimination GP-TS (PE-GP-TS) that disqualifies priors with poor predictive performance, and HyperPrior GP-TS (HP-GP-TS) that utilizes a bi-level Thompson sampling scheme. We theoretically analyze the algorithms and establish upper bounds for their respective regret. In addition, we demonstrate the effectiveness of our algorithms compared to the alternatives through extensive experiments with synthetic and real-world data.
翻译:高斯过程(GP)赌博机为未知函数的黑盒优化提供了一个强大的框架。未知函数的特性在很大程度上取决于所假设的GP先验。文献中的大多数工作都假设该先验是已知的,但在实践中这很少成立。相反,实践者通常依赖最大似然估计来选择先验的超参数——这缺乏理论保证。在本工作中,我们提出了两种基于GP汤普森采样(GP-TS)的、用于GP赌博机中联合先验选择与遗憾最小化的算法:淘汰预测性能不佳先验的Prior-Elimination GP-TS(PE-GP-TS),以及利用双层汤普森采样方案的HyperPrior GP-TS(HP-GP-TS)。我们从理论上分析了这些算法,并建立了它们各自遗憾的上界。此外,我们通过使用合成数据和真实世界数据进行的大量实验,证明了我们的算法相较于替代方案的有效性。