In this paper, we consider the problem of black-box optimization with noisy feedback revealed in batches, where the unknown function to optimize has a bounded norm in some Reproducing Kernel Hilbert Space (RKHS). We refer to this as the Batched Kernelized Bandits problem, and refine and extend existing results on regret bounds. For algorithmic upper bounds, (Li and Scarlett, 2022) shows that $B=O(\log\log T)$ batches suffice to attain near-optimal regret, where $T$ is the time horizon and $B$ is the number of batches. We further refine this by (i) finding the optimal number of batches including constant factors (to within $1+o(1)$), and (ii) removing a factor of $B$ in the regret bound. For algorithm-independent lower bounds, noticing that existing results only apply when the batch sizes are fixed in advance, we present novel lower bounds when the batch sizes are chosen adaptively, and show that adaptive batches have essentially same minimax regret scaling as fixed batches. Furthermore, we consider a robust setting where the goal is to choose points for which the function value remains high even after an adversarial perturbation. We present the robust-BPE algorithm, and show that a suitably-defined cumulative regret notion incurs the same bound as the non-robust setting, and derive a simple regret bound significantly below that of previous work.
翻译:本文研究具有批量噪声反馈的黑箱优化问题,其中待优化的未知函数在某个再生核希尔伯特空间(RKHS)中具有有界范数。我们将此问题称为批处理核化赌博机问题,并对现有遗憾界结果进行了改进与扩展。在算法上界方面,(Li and Scarlett, 2022)证明$B=O(\log\log T)$个批次足以实现接近最优的遗憾,其中$T$为时间范围,$B$为批次数。我们通过以下方式进一步改进该结果:(i)找到包含常数因子(精确至$1+o(1)$)的最优批次数;(ii)消除遗憾界中的$B$因子。在算法无关下界方面,注意到现有结果仅适用于批次大小预先固定的情况,我们提出了批次大小自适应选择时的新下界,并证明自适应批次的极小极大遗憾尺度与固定批次基本相同。此外,我们考虑了一个鲁棒设置,其目标是在即使遭受对抗性扰动后函数值仍能保持高位的点。我们提出了robust-BPE算法,证明经适当定义的累积遗憾概念与非鲁棒设置具有相同的界,并推导出显著低于先前工作的简单遗憾界。