Deep learning models are often trained on distributed, web-scale datasets crawled from the internet. In this paper, we introduce two new dataset poisoning attacks that intentionally introduce malicious examples to a model's performance. Our attacks are immediately practical and could, today, poison 10 popular datasets. Our first attack, split-view poisoning, exploits the mutable nature of internet content to ensure a dataset annotator's initial view of the dataset differs from the view downloaded by subsequent clients. By exploiting specific invalid trust assumptions, we show how we could have poisoned 0.01% of the LAION-400M or COYO-700M datasets for just $60 USD. Our second attack, frontrunning poisoning, targets web-scale datasets that periodically snapshot crowd-sourced content -- such as Wikipedia -- where an attacker only needs a time-limited window to inject malicious examples. In light of both attacks, we notify the maintainers of each affected dataset and recommended several low-overhead defenses.
翻译:深度学习模型通常基于从互联网爬取的分布式、网络规模数据集进行训练。本文提出两种新型数据集投毒攻击方法,旨在通过恶意样本系统性地影响模型性能。我们的攻击具有即时实用性,目前已能对10个主流数据集实施投毒。第一种攻击名为"分视投毒",利用互联网内容的可变特性,确保数据集标注者的初始视图与后续客户端下载的视图存在差异。通过利用特定的无效信任假设,我们证明仅需60美元即可投毒LAION-400M或COYO-700M数据集中0.01%的样本。第二种攻击名为"抢先投毒",针对周期性抓取众包内容(如维基百科)的网络规模数据集,攻击者仅需有限时间窗口即可注入恶意样本。基于这两类攻击,我们已通知受影响数据集的管理者,并推荐了若干低开销防御方案。