We study a widely used Bayesian optimization method, Gaussian process Thompson sampling (GP-TS), under the assumption that the objective function is a sample path from a GP. Compared with the GP upper confidence bound (GP-UCB) with established high-probability and expected regret bounds, most analyses of GP-TS have been limited to expected regret. Moreover, whether the recent analyses of GP-UCB for the lenient regret and the improved cumulative regret upper bound can be applied to GP-TS remains unclear. To fill these gaps, this paper shows several regret bounds: (i) a regret lower bound for GP-TS, which implies that GP-TS suffers from a polynomial dependence on $1/δ$ with probability $δ$, (ii) an upper bound of the second moment of cumulative regret, which directly suggests an improved regret upper bound on $δ$, (iii) expected lenient regret upper bounds, and (iv) an improved cumulative regret upper bound on the time horizon $T$. Along the way, we provide several useful lemmas, including a relaxation of the necessary condition from recent analysis to obtain improved regret upper bounds on $T$.
翻译:我们研究了广泛使用的贝叶斯优化方法——高斯过程汤普森采样(GP-TS),假设目标函数是高斯过程的一个样本路径。与具有已建立的高概率遗憾界和期望遗憾界的高斯过程上置信界(GP-UCB)相比,GP-TS的大部分分析局限于期望遗憾。此外,近期关于GP-UCB的宽松遗憾和改进的累积遗憾上界的分析是否能应用于GP-TS仍不清楚。为填补这些空白,本文展示了若干遗憾界:(i) GP-TS的遗憾下界,表明GP-TS以概率$δ$对$1/δ$存在多项式依赖;(ii) 累积遗憾第二矩的上界,直接暗示了关于$δ$的改进遗憾上界;(iii) 期望宽松遗憾上界,以及(iv) 关于时间跨度$T$的改进累积遗憾上界。在此过程中,我们提供了若干有用的引理,包括对近期分析中必要条件的一个松弛,以获得关于$T$的改进累积遗憾上界。