In data poisoning attacks, an adversary tries to change a model's prediction by adding, modifying, or removing samples in the training data. Recently, ensemble-based approaches for obtaining provable defenses against data poisoning have been proposed where predictions are done by taking a majority vote across multiple base models. In this work, we show that merely considering the majority vote in ensemble defenses is wasteful as it does not effectively utilize available information in the logits layers of the base models. Instead, we propose Run-Off Election (ROE), a novel aggregation method based on a two-round election across the base models: In the first round, models vote for their preferred class and then a second, Run-Off election is held between the top two classes in the first round. Based on this approach, we propose DPA+ROE and FA+ROE defense methods based on Deep Partition Aggregation (DPA) and Finite Aggregation (FA) approaches from prior work. We evaluate our methods on MNIST, CIFAR-10, and GTSRB and obtain improvements in certified accuracy by up to 3%-4%. Also, by applying ROE on a boosted version of DPA, we gain improvements around 12%-27% comparing to the current state-of-the-art, establishing a new state-of-the-art in (pointwise) certified robustness against data poisoning. In many cases, our approach outperforms the state-of-the-art, even when using 32 times less computational power.
翻译:在数据投毒攻击中,攻击者试图通过添加、修改或移除训练数据中的样本来改变模型的预测结果。近期,基于集成学习的可证明防御数据投毒方法被提出,其通过多个基础模型进行多数投票来做出预测。本研究表明,在集成防御中仅考虑多数投票是一种资源浪费,因为它未能有效利用基础模型logits层中的可用信息。为此,我们提出"决胜选举"(Run-Off Election, ROE)——一种基于两轮基础模型选举的新型聚合方法:第一轮中,各模型投票选择其偏好的类别;随后,在第一轮得票最高的两个类别之间进行第二轮决胜选举。基于此方法,我们提出了结合深度分区聚合(Deep Partition Aggregation, DPA)和有限聚合(Finite Aggregation, FA)的DPA+ROE和FA+ROE防御方法。在MNIST、CIFAR-10和GTSRB数据集上的评估表明,认证准确率提升达3%-4%。此外,将ROE应用于增强版DPA后,与当前最优方法相比可获得约12%-27%的性能提升,在(逐点)可证明抗数据投毒鲁棒性领域确立了新的最优水平。在许多情况下,即便仅使用当前最优方法1/32的计算资源,我们的方法仍能超越其性能表现。