In the kernelized bandit problem, a learner aims to sequentially compute the optimum of a function lying in a reproducing kernel Hilbert space given only noisy evaluations at sequentially chosen points. In particular, the learner aims to minimize regret, which is a measure of the suboptimality of the choices made. Arguably the most popular algorithm is the Gaussian Process Upper Confidence Bound (GP-UCB) algorithm, which involves acting based on a simple linear estimator of the unknown function. Despite its popularity, existing analyses of GP-UCB give a suboptimal regret rate, which fails to be sublinear for many commonly used kernels such as the Mat\'ern kernel. This has led to a longstanding open question: are existing regret analyses for GP-UCB tight, or can bounds be improved by using more sophisticated analytical techniques? In this work, we resolve this open question and show that GP-UCB enjoys nearly optimal regret. In particular, our results yield sublinear regret rates for the Mat\'ern kernel, improving over the state-of-the-art analyses and partially resolving a COLT open problem posed by Vakili et al. Our improvements rely on a key technical contribution -- regularizing kernel ridge estimators in proportion to the smoothness of the underlying kernel $k$. Applying this key idea together with a largely overlooked concentration result in separable Hilbert spaces (for which we provide an independent, simplified derivation), we are able to provide a tighter analysis of the GP-UCB algorithm.
翻译:在核化赌博机问题中,学习器旨在基于顺序选择的点上的含噪声评估,序贯地计算位于再生核希尔伯特空间中的函数的最优值。特别地,学习器旨在最小化遗憾,这是衡量所选点次优性的指标。可以说最流行的算法是高斯过程上置信界(GP-UCB)算法,该算法基于未知函数的简单线性估计进行决策。尽管其流行性,现有对GP-UCB的分析给出了次优的遗憾率,对于许多常用核(如Matérn核)无法实现次线性。这导致了一个长期存在的开放问题:现有对GP-UCB的遗憾分析是否紧致,还是可以通过使用更复杂的分析技术来改进边界?在这项工作中,我们解决了这个开放问题,并证明了GP-UCB具有近乎最优的遗憾。特别地,我们的结果为Matérn核产生了次线性遗憾率,改进了最先进的分析,并部分解决了Vakili等人提出的COLT开放问题。我们的改进依赖于一项关键技术贡献——根据底层核$k$的光滑性对核岭估计器进行正则化。将这一关键思想与一个在可分离希尔伯特空间中很大程度上被忽视的集中结果(我们为其提供了独立的简化推导)相结合,我们能够对GP-UCB算法提供更紧致的分析。