We study a widely used Bayesian optimization method, Gaussian process Thompson sampling (GP-TS), under the assumption that the objective function is a sample path from a GP. Compared with the GP upper confidence bound (GP-UCB) with established high-probability and expected regret bounds, most analyses of GP-TS have been limited to expected regret. Moreover, whether the recent analyses of GP-UCB for the lenient regret and the improved cumulative regret upper bound can be applied to GP-TS remains unclear. To fill these gaps, this paper shows several regret bounds: (i) a regret lower bound for GP-TS, which implies that GP-TS suffers from a polynomial dependence on $1/δ$ with probability $δ$, (ii) an upper bound of the second moment of cumulative regret, which directly suggests an improved regret upper bound on $δ$, (iii) expected lenient regret upper bounds, and (iv) an improved cumulative regret upper bound on the time horizon $T$. Along the way, we provide several useful lemmas, including a relaxation of the necessary condition from recent analysis to obtain improved regret upper bounds on $T$.
翻译:本文研究一种广泛使用的贝叶斯优化方法——高斯过程Thompson采样(GP-TS),其假设目标函数为高斯过程的样本路径。相较于已建立高概率与期望遗憾界的GP上置信界(GP-UCB)方法,现有对GP-TS的分析大多局限于期望遗憾。此外,近期针对GP-UCB在宽松遗憾度量及改进累积遗憾上界方面的分析能否适用于GP-TS仍不明确。为填补这些研究空白,本文证明了若干遗憾界:(i)GP-TS的遗憾下界,表明GP-TS以$δ$概率存在对$1/δ$的多项式依赖;(ii)累积遗憾二阶矩的上界,直接推导出关于$δ$的改进遗憾上界;(iii)期望宽松遗憾上界;(iv)关于时间范围$T$的改进累积遗憾上界。在研究过程中,我们提供了若干实用引理,包括对近期分析中为获得关于$T$的改进遗憾上界所需必要条件的松弛形式。