We study reward poisoning attacks on Combinatorial Multi-armed Bandits (CMAB). We first provide a sufficient and necessary condition for the attackability of CMAB, which depends on the intrinsic properties of the corresponding CMAB instance such as the reward distributions of super arms and outcome distributions of base arms. Additionally, we devise an attack algorithm for attackable CMAB instances. Contrary to prior understanding of multi-armed bandits, our work reveals a surprising fact that the attackability of a specific CMAB instance also depends on whether the bandit instance is known or unknown to the adversary. This finding indicates that adversarial attacks on CMAB are difficult in practice and a general attack strategy for any CMAB instance does not exist since the environment is mostly unknown to the adversary. We validate our theoretical findings via extensive experiments on real-world CMAB applications including probabilistic maximum covering problem, online minimum spanning tree, cascading bandits for online ranking, and online shortest path.
翻译:我们研究了组合多臂老虎机(CMAB)中的奖励投毒攻击。首先,我们给出了CMAB可攻击性的充分必要条件,该条件取决于相应CMAB实例的内在属性,例如超臂的奖励分布和基臂的结果分布。此外,我们针对可攻击的CMAB实例设计了一种攻击算法。与以往对多臂老虎机的理解相反,我们的工作揭示了一个令人惊讶的事实:特定CMAB实例的可攻击性还取决于该老虎机实例对攻击者而言是已知还是未知。这一发现表明,在实践中对CMAB进行对抗攻击较为困难,且由于环境对攻击者而言通常是未知的,因此不存在适用于任意CMAB实例的通用攻击策略。我们通过在真实CMAB应用(包括概率最大覆盖问题、在线最小生成树、用于在线排序的级联老虎机以及在线最短路径)上的大量实验,验证了我们的理论发现。