Backdoor poisoning attacks pose a well-known risk to neural networks. However, most studies have focused on lenient threat models. We introduce Silent Killer, a novel attack that operates in clean-label, black-box settings, uses a stealthy poison and trigger and outperforms existing methods. We investigate the use of universal adversarial perturbations as triggers in clean-label attacks, following the success of such approaches under poison-label settings. We analyze the success of a naive adaptation and find that gradient alignment for crafting the poison is required to ensure high success rates. We conduct thorough experiments on MNIST, CIFAR10, and a reduced version of ImageNet and achieve state-of-the-art results.
翻译:后门投毒攻击对神经网络构成众所周知的风险。然而,多数研究聚焦于宽松的威胁模型。我们提出Silent Killer,一种新颖的攻击方法,在干净标签、黑盒设置下运作,使用隐蔽的毒药与触发器,并优于现有方法。我们借鉴通用对抗扰动在毒药标签设置下的成功经验,探究将其作为干净标签攻击中触发器的可行性。通过分析朴素迁移方法的成功条件,我们发现生成毒药时需进行梯度对齐以确保高成功率。我们在MNIST、CIFAR10及ImageNet精简版本上开展全面实验,取得了当前最优结果。