Training modern neural networks or models typically requires averaging over a sample of high-dimensional vectors. Poisoning attacks can skew or bias the average vectors used to train the model, forcing the model to learn specific patterns or avoid learning anything useful. Byzantine robust aggregation is a principled algorithmic defense against such biasing. Robust aggregators can bound the maximum bias in computing centrality statistics, such as mean, even when some fraction of inputs are arbitrarily corrupted. Designing such aggregators is challenging when dealing with high dimensions. However, the first polynomial-time algorithms with strong theoretical bounds on the bias have recently been proposed. Their bounds are independent of the number of dimensions, promising a conceptual limit on the power of poisoning attacks in their ongoing arms race against defenses. In this paper, we show a new attack called HIDRA on practical realization of strong defenses which subverts their claim of dimension-independent bias. HIDRA highlights a novel computational bottleneck that has not been a concern of prior information-theoretic analysis. Our experimental evaluation shows that our attacks almost completely destroy the model performance, whereas existing attacks with the same goal fail to have much effect. Our findings leave the arms race between poisoning attacks and provable defenses wide open.
翻译:现代神经网络或模型的训练通常需要对高维向量样本进行平均。投毒攻击能够扭曲或偏置用于训练模型的平均向量,迫使模型学习特定模式或避免学习任何有用信息。拜占庭鲁棒聚合是一种针对此类偏置的原理性算法防御。即使部分输入被任意篡改,鲁棒聚合器仍能限制计算中心性统计量(如均值)时的最大偏置。然而,在高维场景下设计此类聚合器极具挑战性。近期,首个具有强理论偏置界且时间复杂度为多项式的算法被提出,其偏置界与维度数量无关,这为投毒攻击在与防御措施的持续对抗中设定了概念性能力上限。本文提出一种名为HIDRA的新型攻击方法,该攻击针对强防御措施的实际实现,颠覆了其维度无关偏置的论断。HIDRA揭示了一个新颖的计算瓶颈,该瓶颈在以往的信息论分析中未被关注。实验评估表明,我们的攻击几乎完全破坏了模型性能,而具有相同目标的现有攻击则收效甚微。我们的发现表明,投毒攻击与可证明防御之间的对抗格局仍处于完全开放状态。