We study indiscriminate poisoning for linear learners where an adversary injects a few crafted examples into the training data with the goal of forcing the induced model to incur higher test error. Inspired by the observation that linear learners on some datasets are able to resist the best known attacks even without any defenses, we further investigate whether datasets can be inherently robust to indiscriminate poisoning attacks for linear learners. For theoretical Gaussian distributions, we rigorously characterize the behavior of an optimal poisoning attack, defined as the poisoning strategy that attains the maximum risk of the induced model at a given poisoning budget. Our results prove that linear learners can indeed be robust to indiscriminate poisoning if the class-wise data distributions are well-separated with low variance and the size of the constraint set containing all permissible poisoning points is also small. These findings largely explain the drastic variation in empirical attack performance of the state-of-the-art poisoning attacks on linear learners across benchmark datasets, making an important initial step towards understanding the underlying reasons some learning tasks are vulnerable to data poisoning attacks.
翻译:本文研究线性学习器面临的无差别投毒攻击问题:攻击者通过在训练数据中注入少量精心构造的样本,以迫使最终模型产生更高的测试误差。鉴于部分数据集上的线性学习器即使在未部署防御措施的情况下也能抵抗已知最优攻击,我们进一步探究数据集是否天然具有对抗无差别投毒攻击的鲁棒性。针对理论高斯分布,我们严格刻画了最优投毒攻击的行为特征——即给定投毒预算下能最大化诱导模型风险的攻击策略。研究结果表明:当类别数据分布具有良好分离性且方差较低,同时允许投毒点的约束集规模较小时,线性学习器确实能够对无差别投毒攻击保持鲁棒。这些发现很大程度上解释了为何当前最先进的线性学习器投毒攻击在基准数据集上的实证攻击性能存在显著差异,为理解特定学习任务易受数据投毒攻击的根本原因迈出了重要第一步。