Training modern neural networks or models typically requires averaging over a sample of high-dimensional vectors. Poisoning attacks can skew or bias the average vectors used to train the model, forcing the model to learn specific patterns or avoid learning anything useful. Byzantine robust aggregation is a principled algorithmic defense against such biasing. Robust aggregators can bound the maximum bias in computing centrality statistics, such as mean, even when some fraction of inputs are arbitrarily corrupted. Designing such aggregators is challenging when dealing with high dimensions. However, the first polynomial-time algorithms with strong theoretical bounds on the bias have recently been proposed. Their bounds are independent of the number of dimensions, promising a conceptual limit on the power of poisoning attacks in their ongoing arms race against defenses. In this paper, we show a new attack called HIDRA on practical realization of strong defenses which subverts their claim of dimension-independent bias. HIDRA highlights a novel computational bottleneck that has not been a concern of prior information-theoretic analysis. Our experimental evaluation shows that our attacks almost completely destroy the model performance, whereas existing attacks with the same goal fail to have much effect. Our findings leave the arms race between poisoning attacks and provable defenses wide open.
翻译:训练现代神经网络或模型通常需要对高维向量样本进行平均。投毒攻击会扭曲或偏置用于训练模型的平均向量,迫使模型学习特定模式或完全停止学习。拜占庭鲁棒聚合是一种针对此类偏置的理论完备的算法防御手段。即使部分输入被任意破坏,鲁棒聚合器也能限制计算中心性统计量(如均值)时的最大偏置。在高维场景下设计此类聚合器极具挑战性。然而,近期已提出首批具有强理论偏置界的多项式时间算法。这些算法的界与维度无关,预示了在投毒攻击与防御持续攻防竞赛中攻击能力的理论极限。本文提出了一种名为HIDRA的新攻击,针对强防御的实际实现,颠覆了其宣称的维度无关偏置特性。HIDRA揭示了先前信息论分析未曾关注的新的计算瓶颈。实验评估表明,我们的攻击几乎完全摧毁了模型性能,而具有相同目标的现有攻击却收效甚微。这一发现使得投毒攻击与可证明防御之间的攻防竞赛仍充满变数。