We study Gaussian Process Thompson Sampling (GP-TS) for sequential decision-making over compact, continuous action spaces and provide a frequentist regret analysis based on fractional Gaussian process posteriors, without relying on domain discretization as in prior work. We show that the variance inflation commonly assumed in existing analyses of GP-TS can be interpreted as Thompson Sampling with respect to a fractional posterior with tempering parameter $α\in (0,1)$. We derive a kernel-agnostic regret bound expressed in terms of the information gain parameter $γ_t$ and the posterior contraction rate $ε_t$, and identify conditions on the Gaussian process prior under which $ε_t$ can be controlled. As special cases of our general bound, we recover regret of order $\tilde{\mathcal{O}}(T^{\frac{1}{2}})$ for the squared exponential kernel, $\tilde{\mathcal{O}}(T^{\frac{2ν+3d}{2(2ν+d)}} )$ for the Matérn-$ν$ kernel, and a bound of order $\tilde{\mathcal{O}}(T^{\frac{2ν+3d}{2(2ν+d)}})$ for the rational quadratic kernel. Overall, our analysis provides a unified and discretization-free regret framework for GP-TS that applies broadly across kernel classes.
翻译:本文研究高斯过程汤普森采样(GP-TS)在紧致连续动作空间上的序贯决策问题,并基于分数高斯过程后验给出频率主义遗憾分析,无需依赖现有工作中常用的域离散化方法。我们证明,现有GP-TS分析中通常假设的方差膨胀可解释为针对具有退火参数$α\in (0,1)$的分数后验执行汤普森采样。我们推导出用信息增益参数$γ_t$和后验收缩率$ε_t$表示的核无关遗憾界,并确定了能够控制$ε_t$的高斯过程先验条件。作为一般界的特例,我们恢复了平方指数核的$\tilde{\mathcal{O}}(T^{\frac{1}{2}})$阶遗憾、Matérn-$ν$核的$\tilde{\mathcal{O}}(T^{\frac{2ν+3d}{2(2ν+d)}} )$阶遗憾,以及有理二次核的$\tilde{\mathcal{O}}(T^{\frac{2ν+3d}{2(2ν+d)}})$阶遗憾界。总体而言,我们的分析为GP-TS提供了一个统一且无需离散化的遗憾分析框架,该框架广泛适用于各类核函数。