Sequential optimization of black-box functions from noisy evaluations has been widely studied, with Gaussian Process bandit algorithms such as GP-UCB guaranteeing no-regret in stationary settings. However, for time-varying objectives, it is known that no-regret is unattainable under pure bandit feedback unless strong and often unrealistic assumptions are imposed. In this article, we propose a novel method to optimize time-varying rewards in the frequentist setting, where the objective has bounded RKHS norm. Time variations are captured through uncertainty injection (UI), which enables heteroscedastic GP regression that adapts past observations to the current time step. As no-regret is unattainable in general in the strict bandit setting, we relax the latter allowing additional queries on previously observed points. Building on sparse inference and the effect of UI on regret, we propose \textbf{W-SparQ-GP-UCB}, an online algorithm that achieves no-regret with only a vanishing number of additional queries per iteration. To assess the theoretical limits of this approach, we establish a lower bound on the number of additional queries required for no-regret, proving the efficiency of our method. Finally, we provide a comprehensive analysis linking the degree of time-variation of the function to achievable regret rates, together with upper and lower bounds on the number of additional queries needed in each regime.
翻译:基于噪声观测的黑箱函数序列优化已被广泛研究,其中高斯过程赌博算法(如GP-UCB)在平稳环境下可保证无遗憾性能。然而,对于时变目标函数,已知在纯赌博反馈下无法实现无遗憾,除非施加强且通常不现实的假设。本文提出一种在频率主义框架下优化时变奖励的新方法,其中目标函数具有有界再生核希尔伯特空间(RKHS)范数。时变特性通过不确定性注入(UI)进行捕捉,该方法支持异方差高斯过程回归,可将历史观测适配至当前时间步。由于严格赌博设定下通常无法实现无遗憾,我们放宽该设定,允许对已观测点进行额外查询。基于稀疏推断及UI对遗憾的影响,我们提出\\textbf{W-SparQ-GP-UCB}——一种在线算法,仅需每轮迭代消耗可忽略的额外查询即可实现无遗憾。为评估该方法的理论极限,我们建立了无遗憾所需额外查询次数的下界,证明了本方法的效率。最后,我们通过综合分析建立了函数时变程度与可达到遗憾率之间的关联,并给出了不同时变机制下所需额外查询次数的上下界。