Black box optimisation of an unknown function from expensive and noisy evaluations is a ubiquitous problem in machine learning, academic research and industrial production. An abstraction of the problem can be formulated as a kernel based bandit problem (also known as Bayesian optimisation), where a learner aims at optimising a kernelized function through sequential noisy observations. The existing work predominantly assumes feedback is immediately available; an assumption which fails in many real world situations, including recommendation systems, clinical trials and hyperparameter tuning. We consider a kernel bandit problem under stochastically delayed feedback, and propose an algorithm with $\tilde{\mathcal{O}}(\sqrt{\Gamma_k(T)T}+\mathbb{E}[\tau])$ regret, where $T$ is the number of time steps, $\Gamma_k(T)$ is the maximum information gain of the kernel with $T$ observations, and $\tau$ is the delay random variable. This represents a significant improvement over the state of the art regret bound of $\tilde{\mathcal{O}}(\Gamma_k(T)\sqrt{T}+\mathbb{E}[\tau]\Gamma_k(T))$ reported in Verma et al. (2022). In particular, for very non-smooth kernels, the information gain grows almost linearly in time, trivializing the existing results. We also validate our theoretical results with simulations.
翻译:通过昂贵且带有噪声的评估来优化未知函数的黑箱优化是机器学习、学术研究和工业生产中普遍存在的问题。该问题可抽象为基于核函数的赌博机问题(亦称贝叶斯优化),其中学习者旨在通过序贯噪声观测优化核函数。现有研究主要假设反馈立即可得——这一假设在推荐系统、临床试验和超参数调优等众多现实场景中并不成立。本文研究了随机延迟反馈下的核函数赌博机问题,提出了一种具有$\tilde{\mathcal{O}}(\sqrt{\Gamma_k(T)T}+\mathbb{E}[\tau])$遗憾值的算法,其中$T$为时间步数,$\Gamma_k(T)$是包含$T$次观测时核函数的最大信息增益,$\tau$为延迟随机变量。该结果显著优于Verma等人(2022)报道的$\tilde{\mathcal{O}}(\Gamma_k(T)\sqrt{T}+\mathbb{E}[\tau]\Gamma_k(T))$当前最优遗憾界。特别地,对于极非平滑核函数,信息增益随时间近乎线性增长,这使现有结果失去实际意义。我们通过仿真实验验证了理论结果。