Deep learning models are often trained on distributed, webscale datasets crawled from the internet. In this paper, we introduce two new dataset poisoning attacks that intentionally introduce malicious examples to a model's performance. Our attacks are immediately practical and could, today, poison 10 popular datasets. Our first attack, split-view poisoning, exploits the mutable nature of internet content to ensure a dataset annotator's initial view of the dataset differs from the view downloaded by subsequent clients. By exploiting specific invalid trust assumptions, we show how we could have poisoned 0.01% of the LAION-400M or COYO-700M datasets for just $60 USD. Our second attack, frontrunning poisoning, targets web-scale datasets that periodically snapshot crowd-sourced content -- such as Wikipedia -- where an attacker only needs a time-limited window to inject malicious examples. In light of both attacks, we notify the maintainers of each affected dataset and recommended several low-overhead defenses.
翻译:深度学习模型通常基于从互联网抓取的分布式、网络规模数据集进行训练。本文提出了两种新型数据集投毒攻击方法,能够故意向模型中引入恶意样本以影响其性能。我们的攻击具有即时可行性,目前可对10个流行数据集进行投毒。第一种攻击"视角分裂投毒"利用了互联网内容的可变性,确保数据集标注者初始查看的数据集与后续客户端下载的数据集不同。通过利用特定的错误信任假设,我们证明仅需60美元即可对LAION-400M或COYO-700M数据集中的0.01%样本进行投毒。第二种攻击"抢先投毒"针对定期抓取众包内容(如维基百科)的网络规模数据集,攻击者仅需有限时间窗口即可注入恶意样本。针对这两种攻击,我们已通知受影响数据集的维护者,并推荐了若干低开销防御措施。