In the kernelized bandit problem, a learner aims to sequentially compute the optimum of a function lying in a reproducing kernel Hilbert space given only noisy evaluations at sequentially chosen points. In particular, the learner aims to minimize regret, which is a measure of the suboptimality of the choices made. Arguably the most popular algorithm is the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm, which involves acting based on a simple linear estimator of the unknown function. Despite its popularity, existing analyses of GP-UCB give a suboptimal regret rate, which fails to be sublinear for many commonly used kernels such as the Mat\'ern kernel. This has led to a longstanding open question: are existing regret analyses for GP-UCB tight, or can bounds be improved by using more sophisticated analytical techniques? In this work, we resolve this open question and show that GP-UCB enjoys nearly optimal regret. In particular, our results directly imply sublinear regret rates for the Mat\'ern kernel, improving over the state-of-the-art analyses and partially resolving a COLT open problem posed by Vakili et al. Our improvements rely on two key technical results. First, we use modern supermartingale techniques to construct a novel, self-normalized concentration inequality that greatly simplifies existing approaches. Second, we address the importance of regularizing in proportion to the smoothness of the underlying kernel $k$. Together, these new technical tools enable a simplified, tighter analysis of the GP-UCB algorithm.
翻译:在核化赌博机问题中,学习者旨在通过顺序选取的点上的带噪评估,逐步计算位于再生核希尔伯特空间中的函数最优值。具体而言,学习者希望最小化遗憾——即所选决策的次优性度量。最著名的算法当属高斯过程上置信界算法,该算法基于未知函数的简单线性估计进行决策。尽管其广泛流行,现有GP-UCB分析给出的遗憾率并非最优,对许多常用核(如马特恩核)无法实现次线性收敛。这引发了一个长期悬而未决的问题:现有GP-UCB的遗憾分析是否紧致?抑或可通过更精细的分析技术改进边界?本文解决了这一开放问题,证明GP-UCB具有近乎最优的遗憾。特别地,我们的结果直接推导出马特恩核的次线性遗憾率,改进了现有分析,并部分解决了Vakili等人提出的COLT开放问题。改进依赖两项关键技术成果:其一,利用现代超鞅技术构建了新颖的自归一化集中不等式,大幅简化了现有方法;其二,论证了根据基础核$k$的光滑性进行正则化的重要性。这些新技术工具共同实现了对GP-UCB算法的简化且更紧致的分析。