This paper studies a non-stationary kernelized bandit (KB) problem, also called time-varying Bayesian optimization, where one seeks to minimize the regret under an unknown reward function that varies over time. In particular, we focus on a near-optimal algorithm whose regret upper bound matches the regret lower bound. For this goal, we show the first algorithm-independent regret lower bound for non-stationary KB with squared exponential and Mat\'ern kernels, which reveals that an existing optimization-based KB algorithm with slight modification is near-optimal. However, this existing algorithm suffers from feasibility issues due to its huge computational cost. Therefore, we propose a novel near-optimal algorithm called restarting phased elimination with random permutation (R-PERP), which bypasses the huge computational cost. A technical key point is the simple permutation procedures of query candidates, which enable us to derive a novel tighter confidence bound tailored to the non-stationary problems.
翻译:本文研究非平稳核化赌博机问题,亦称时变贝叶斯优化,其目标是在随时间变化的未知奖励函数下最小化遗憾度。特别地,我们聚焦于遗憾度上界与遗憾度下界匹配的近似最优算法。为此,我们首次针对采用平方指数核与Matérn核的非平稳核化赌博机问题提出了算法无关的遗憾度下界,证明现有基于优化的核化赌博机算法经轻微修改后即可达到近似最优。然而,该现有算法因计算成本过高而存在可行性问题。为此,我们提出一种名为随机置换重启阶段消除算法的新型近似最优算法,该算法通过引入查询候选集的简单置换流程,成功规避了高昂计算成本。关键技术点在于针对非平稳问题定制的更紧置信界推导,这得益于我们对候选查询集的置换操作设计。