Among various acquisition functions (AFs) in Bayesian optimization (BO), Gaussian process upper confidence bound (GP-UCB) and Thompson sampling (TS) are well-known options with established theoretical properties regarding Bayesian cumulative regret (BCR). Recently, it has been shown that a randomized variant of GP-UCB achieves a tighter BCR bound compared with GP-UCB, which we call the tighter BCR bound for brevity. Inspired by this study, this paper first shows that TS achieves the tighter BCR bound. On the other hand, GP-UCB and TS often practically suffer from manual hyperparameter tuning and over-exploration issues, respectively. To overcome these difficulties, we propose yet another AF called a probability of improvement from the maximum of a sample path (PIMS). We show that PIMS achieves the tighter BCR bound and avoids the hyperparameter tuning, unlike GP-UCB. Furthermore, we demonstrate a wide range of experiments, focusing on the effectiveness of PIMS that mitigates the practical issues of GP-UCB and TS.
翻译:在贝叶斯优化(BO)的各类采集函数(AF)中,高斯过程上置信界(GP-UCB)和汤普森采样(TS)是两种经典方法,且具有关于贝叶斯累积遗憾(BCR)的成熟理论性质。近期研究表明,GP-UCB的随机化变体可达到比标准GP-UCB更紧的BCR界(简称更紧界)。受此启发,本文首先证明TS同样能达到该更紧BCR界。另一方面,GP-UCB和TS在实际应用中分别存在超参数手动调优及过度探索的问题。为克服这些困难,我们提出一种新型采集函数——基于样本路径最大值的改进概率(PIMS)。理论分析表明,PIMS既能达到更紧BCR界,又避免了GP-UCB所需的超参数调优。此外,通过大规模实验验证了PIMS在缓解GP-UCB和TS实际应用问题上的有效性。