Lipschitz bandit is a variant of stochastic bandits that deals with a continuous arm set defined on a metric space, where the reward function is subject to a Lipschitz constraint. In this paper, we introduce a new problem of Lipschitz bandits in the presence of adversarial corruptions where an adaptive adversary corrupts the stochastic rewards up to a total budget $C$. The budget is measured by the sum of corruption levels across the time horizon $T$. We consider both weak and strong adversaries, where the weak adversary is unaware of the current action before the attack, while the strong one can observe it. Our work presents the first line of robust Lipschitz bandit algorithms that can achieve sub-linear regret under both types of adversary, even when the total budget of corruption $C$ is unrevealed to the agent. We provide a lower bound under each type of adversary, and show that our algorithm is optimal under the strong case. Finally, we conduct experiments to illustrate the effectiveness of our algorithms against two classic kinds of attacks.
翻译:Lipschitz赌博机是随机赌博机的一种变体,处理定义在度量空间上的连续臂集,其奖励函数受Lipschitz约束。本文引入了一个新问题:存在对抗性污染环境下的Lipschitz赌博机,其中自适应对手在总预算$C$内污染随机奖励。该预算由时间范围$T$内各污染水平的总和衡量。我们考虑弱对手和强对手两种情形:弱对手在攻击前不知晓当前动作,而强对手可观察到该动作。我们的工作首次提出一系列鲁棒Lipschitz赌博机算法,即使在总污染预算$C$对智能体未知的情况下,也能在两类对手下实现次线性遗憾。我们给出了每类对手下的下界,并证明算法在强对手情形下是最优的。最后,通过实验验证了算法针对两种经典攻击的有效性。