We show that a kernel estimator using multiple function evaluations can be easily converted into a sampling-based bandit estimator with expectation equal to the original kernel estimate. Plugging such a bandit estimator into the standard FTRL algorithm yields a bandit convex optimisation algorithm that achieves $\tilde{O}(t^{1/2})$ regret against adversarial time-varying convex loss functions.
翻译:我们证明,利用多次函数评估的核估计器可以轻松转化为基于抽样的赌博式估计器,其期望值等于原始核估计值。将此类赌博式估计器嵌入标准FTRL算法后,所得赌博式凸优化算法在面对对抗性时变凸损失函数时,可实现$\tilde{O}(t^{1/2})$的遗憾界。