Trainwreck: A damaging adversarial attack on image classifiers

Adversarial attacks are an important security concern for computer vision (CV), as they enable malicious attackers to reliably manipulate CV models. Existing attacks aim to elicit an output desired by the attacker, but keep the model fully intact on clean data. With CV models becoming increasingly valuable assets in applied practice, a new attack vector is emerging: disrupting the models as a form of economic sabotage. This paper opens up the exploration of damaging adversarial attacks (DAAs) that seek to damage the target model and maximize the total cost incurred by the damage. As a pioneer DAA, this paper proposes Trainwreck, a train-time attack that poisons the training data of image classifiers to degrade their performance. Trainwreck conflates the data of similar classes using stealthy ($\epsilon \leq 8/255$) class-pair universal perturbations computed using a surrogate model. Trainwreck is a black-box, transferable attack: it requires no knowledge of the target model's architecture, and a single poisoned dataset degrades the performance of any model trained on it. The experimental evaluation on CIFAR-10 and CIFAR-100 demonstrates that Trainwreck is indeed an effective attack across various model architectures including EfficientNetV2, ResNeXt-101, and a finetuned ViT-L-16. The strength of the attack can be customized by the poison rate parameter. Finally, data redundancy with file hashing and/or pixel difference are identified as a reliable defense technique against Trainwreck or similar DAAs. The code is available at https://github.com/JanZahalka/trainwreck.

翻译：对抗攻击是计算机视觉领域一个重要的安全问题，因为它使恶意攻击者能够可靠地操纵CV模型。现有攻击旨在诱导出攻击者期望的输出，但使模型在干净数据上保持完全完好。随着CV模型在应用实践中成为越来越有价值的资产，一种新的攻击向量正在出现：以经济破坏的形式破坏模型。本文开启了对破坏性对抗攻击的探索，这类攻击旨在破坏目标模型并最大化所造成损害的总成本。作为一种先驱性的DAA，本文提出了Trainwreck，一种通过污染图像分类器训练数据以降低其性能的训练时攻击。Trainwreck使用通过代理模型计算得到的隐蔽性（ϵ ≤ 8/255）类对通用扰动，混淆了相似类别的数据。Trainwreck是一种黑盒、可迁移的攻击：它不需要了解目标模型的架构，单个被污染的数据集即可降低任何在其上训练的模型的性能。在CIFAR-10和CIFAR-100上的实验评估表明，Trainwreck确实是一种有效的攻击，适用于包括EfficientNetV2、ResNeXt-101和微调后的ViT-L-16在内的多种模型架构。攻击强度可通过污染率参数进行自定义。最后，文件哈希和/或像素差异的数据冗余被确定为抵御Trainwreck或类似DAA的可靠防御技术。代码可在 https://github.com/JanZahalka/trainwreck 获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日