We consider availability data poisoning attacks, where an adversary aims to degrade the overall test accuracy of a machine learning model by crafting small perturbations to its training data. Existing poisoning strategies can achieve the attack goal but assume the victim to employ the same learning method as what the adversary uses to mount the attack. In this paper, we argue that this assumption is strong, since the victim may choose any learning algorithm to train the model as long as it can achieve some targeted performance on clean data. Empirically, we observe a large decrease in the effectiveness of prior poisoning attacks if the victim employs an alternative learning algorithm. To enhance the attack transferability, we propose Transferable Poisoning, which first leverages the intrinsic characteristics of alignment and uniformity to enable better unlearnability within contrastive learning, and then iteratively utilizes the gradient information from supervised and unsupervised contrastive learning paradigms to generate the poisoning perturbations. Through extensive experiments on image benchmarks, we show that our transferable poisoning attack can produce poisoned samples with significantly improved transferability, not only applicable to the two learners used to devise the attack but also to learning algorithms and even paradigms beyond.
翻译:本文研究可用性数据投毒攻击,即攻击者通过对训练数据施加微小扰动,旨在降低机器学习模型整体测试精度的攻击场景。现有投毒策略虽能实现攻击目标,但均假设受害者采用与攻击者发起攻击时相同的学习方法。本文指出该假设过强,因为受害者可能选择任意学习算法训练模型,只要其在干净数据上能达到特定性能目标即可。实验表明,若受害者采用不同的学习算法,现有投毒攻击效果会出现显著下降。为增强攻击可迁移性,我们提出可迁移投毒方法:首先利用对齐性与均匀性的内在特性,在对比学习中实现更优的不可学习性;随后迭代利用监督式与无监督对比学习范式的梯度信息生成投毒扰动。通过在图像基准数据集上的大量实验,我们证明所提出的可迁移投毒攻击能生成具有显著提升可迁移性的投毒样本,不仅适用于设计攻击时使用的两种学习器,还能迁移至其他学习算法乃至不同学习范式。