Indiscriminate data poisoning attacks aim to decrease a model's test accuracy by injecting a small amount of corrupted training data. Despite significant interest, existing attacks remain relatively ineffective against modern machine learning (ML) architectures. In this work, we introduce the notion of model poisonability as a technical tool to explore the intrinsic limits of data poisoning attacks. We derive an easily computable threshold to establish and quantify a surprising phase transition phenomenon among popular ML models: data poisoning attacks become effective only when the poisoning ratio exceeds our threshold. Building on existing parameter corruption attacks and refining the Gradient Canceling attack, we perform extensive experiments to confirm our theoretical findings, test the predictability of our transition threshold, and significantly improve existing data poisoning baselines over a range of datasets and models. Our work highlights the critical role played by the poisoning ratio, and sheds new insights on existing empirical results, attacks and mitigation strategies in data poisoning.
翻译:无差别数据投毒攻击旨在通过注入少量被污染的训练数据来降低模型的测试准确率。尽管此类攻击备受关注,但现有攻击方法对现代机器学习架构仍相对无效。本文引入模型可毒性的概念作为技术工具,用于探究数据投毒攻击的内在极限。我们推导出一个易于计算的阈值,以建立并量化流行机器学习模型中一个显著的相变现象:仅当投毒比例超过该阈值时,数据投毒攻击才变得有效。基于现有的参数破坏攻击并改进梯度抵消攻击,我们开展了大量实验以确证理论发现、验证相变阈值的可预测性,并在多个数据集和模型上显著提升了现有数据投毒基准的性能。本研究强调了投毒比例的关键作用,并对数据投毒中现有的实证结果、攻击方法与缓解策略提供了新的见解。