Gaussian process (GP) bandits provide a powerful framework for performing blackbox optimization of unknown functions. The characteristics of the unknown function depend heavily on the assumed GP prior. Most work in the literature assume that this prior is known but in practice this seldom holds. Instead, practitioners often rely on maximum likelihood estimation to select the hyperparameters of the prior - which lacks theoretical guarantees. In this work, we study two algorithms for joint prior selection and regret minimization in GP bandits based on GP Thompson sampling (GP-TS): Prior-Elimination GP-TS (PE-GP-TS) that disqualifies priors with poor predictive performance, and HyperPrior GP-TS (HP-GP-TS) that utilizes a bi-level Thompson sampling scheme. We theoretically analyze the algorithms and establish a sublinear regret bound for HP-GP-TS. In addition, we demonstrate the effectiveness of these algorithms compared to the alternatives through extensive experiments with synthetic and real-world data.
翻译:高斯过程(GP)Bandits为未知函数的黑箱优化提供了强大的框架。未知函数的特性在很大程度上依赖于所假设的GP先验。现有文献中的大多数工作假设该先验已知,但在实际中这一假设很少成立。相反,实践者通常依赖最大似然估计来选择先验的超参数——这种方法缺乏理论保证。在这项工作中,我们研究了基于GP Thompson采样(GP-TS)的两种GP Bandits联合先验选择与遗憾最小化算法:先验消除GP-TS(PE-GP-TS),该算法剔除预测性能较差的先验;以及超先验GP-TS(HP-GP-TS),该算法利用双层Thompson采样方案。我们对这些算法进行了理论分析,并为HP-GP-TS建立了次线性遗憾界。此外,通过使用合成数据和真实数据进行的广泛实验,我们证明了这些算法相较于替代方案的有效性。