Confidence bounds are an essential tool for rigorously quantifying the uncertainty of predictions. They are a core component in many sequential learning and decision-making algorithms, with tighter confidence bounds giving rise to algorithms with better empirical performance and better performance guarantees. In this work, we use martingale tail inequalities to establish new confidence bounds for sequential kernel regression. Our confidence bounds can be computed by solving a conic program, although this bare version quickly becomes impractical, because the number of variables grows with the sample size. However, we show that the dual of this conic program allows us to efficiently compute tight confidence bounds. We prove that our new confidence bounds are always tighter than existing ones in this setting. We apply our confidence bounds to kernel bandit problems, and we find that when our confidence bounds replace existing ones, the KernelUCB (GP-UCB) algorithm has better empirical performance, a matching worst-case performance guarantee and comparable computational cost.
翻译:置信界是严格量化预测不确定性的关键工具,作为许多序列学习与决策算法的核心组成部分,更紧的置信界能够带来具有更优实证性能与更强性能保证的算法。本研究利用鞅尾不等式为序列核回归建立了新的置信界。该置信界可通过求解锥规划获得,但其基础版本会因变量数量随样本量增长而迅速变得不可行。然而,我们证明该锥规划的对偶形式能支持高效计算紧致置信界。我们严格证明了在此设定下,新置信界始终优于现有置信界。将新置信界应用于核赌博机问题时发现:当用其替代现有置信界后,KernelUCB(GP-UCB)算法展现出更优的实证性能、匹配的最坏情况性能保证以及相当的计算成本。