Sequential optimization of black-box functions from noisy evaluations has been widely studied, with Gaussian Process bandit algorithms such as GP-UCB guaranteeing no-regret in stationary settings. However, for time-varying objectives, it is known that no-regret is unattainable under pure bandit feedback unless strong and often unrealistic assumptions are imposed. In this article, we propose a novel method to optimize time-varying rewards in the frequentist setting, where the objective has bounded RKHS norm. Time variations are captured through uncertainty injection (UI), which enables heteroscedastic GP regression that adapts past observations to the current time step. As no-regret is unattainable in general in the strict bandit setting, we relax the latter allowing additional queries on previously observed points. Building on sparse inference and the effect of UI on regret, we propose W-SparQ-GP-UCB, an online algorithm that achieves no-regret with only a vanishing number of additional queries per iteration. To assess the theoretical limits of this approach, we establish a lower bound on the number of additional queries required for no-regret, proving the efficiency of our method. Finally, we provide a comprehensive analysis linking the degree of time-variation of the function to achievable regret rates, together with upper and lower bounds on the number of additional queries needed in each regime.
翻译:从噪声观测中序贯优化黑箱函数已被广泛研究,其中高斯过程赌博算法(如GP-UCB)在平稳环境下可保证无遗憾性能。然而,对于时变目标函数,已知在纯赌博反馈下无法实现无遗憾,除非施加强且通常不现实的假设。本文提出一种在频率主义框架下优化时变奖励的新方法,其中目标函数具有有界再生核希尔伯特空间范数。时变特性通过不确定性注入进行捕捉,该方法支持异方差高斯过程回归,使历史观测适应当前时间步。由于严格赌博设定下通常无法实现无遗憾,我们放宽该设定,允许对已观测点进行额外查询。基于稀疏推断及不确定性注入对遗憾的影响,我们提出W-SparQ-GP-UCB在线算法,该算法仅需每次迭代消耗可忽略的额外查询即可实现无遗憾。为评估该方法的理论极限,我们建立了无遗憾所需额外查询数量的下界,证明了本方法的效率。最后,我们通过综合分析建立了函数时变程度与可达到遗憾率之间的关联,并给出了不同时变机制下所需额外查询数量的上下界。