Adversarial attacks are an important security concern for computer vision (CV), as they enable malicious attackers to reliably manipulate CV models. Existing attacks aim to elicit an output desired by the attacker, but keep the model fully intact on clean data. With CV models becoming increasingly valuable assets in applied practice, a new attack vector is emerging: disrupting the models as a form of economic sabotage. This paper opens up the exploration of damaging adversarial attacks (DAAs) that seek to damage the target model and maximize the total cost incurred by the damage. As a pioneer DAA, this paper proposes Trainwreck, a train-time attack that poisons the training data of image classifiers to degrade their performance. Trainwreck conflates the data of similar classes using stealthy ($\epsilon \leq 8/255$) class-pair universal perturbations computed using a surrogate model. Trainwreck is a black-box, transferable attack: it requires no knowledge of the target model's architecture, and a single poisoned dataset degrades the performance of any model trained on it. The experimental evaluation on CIFAR-10 and CIFAR-100 demonstrates that Trainwreck is indeed an effective attack across various model architectures including EfficientNetV2, ResNeXt-101, and a finetuned ViT-L-16. The strength of the attack can be customized by the poison rate parameter. Finally, data redundancy with file hashing and/or pixel difference are identified as a reliable defense technique against Trainwreck or similar DAAs. The code is available at https://github.com/JanZahalka/trainwreck.
翻译:对抗攻击是计算机视觉领域一个重要的安全问题,因为它使恶意攻击者能够可靠地操纵CV模型。现有攻击旨在诱导出攻击者期望的输出,但使模型在干净数据上保持完全完好。随着CV模型在应用实践中成为越来越有价值的资产,一种新的攻击向量正在出现:以经济破坏的形式破坏模型。本文开启了对破坏性对抗攻击的探索,这类攻击旨在破坏目标模型并最大化所造成损害的总成本。作为一种先驱性的DAA,本文提出了Trainwreck,一种通过污染图像分类器训练数据以降低其性能的训练时攻击。Trainwreck使用通过代理模型计算得到的隐蔽性(ϵ ≤ 8/255)类对通用扰动,混淆了相似类别的数据。Trainwreck是一种黑盒、可迁移的攻击:它不需要了解目标模型的架构,单个被污染的数据集即可降低任何在其上训练的模型的性能。在CIFAR-10和CIFAR-100上的实验评估表明,Trainwreck确实是一种有效的攻击,适用于包括EfficientNetV2、ResNeXt-101和微调后的ViT-L-16在内的多种模型架构。攻击强度可通过污染率参数进行自定义。最后,文件哈希和/或像素差异的数据冗余被确定为抵御Trainwreck或类似DAA的可靠防御技术。代码可在 https://github.com/JanZahalka/trainwreck 获取。