Sabre is a defense to adversarial examples that was accepted at IEEE S&P 2024. We first reveal significant flaws in the evaluation that point to clear signs of gradient masking. We then show the cause of this gradient masking: a bug in the original evaluation code. By fixing a single line of code in the original repository, we reduce Sabre's robust accuracy to 0%. In response to this, the authors modify the defense and introduce a new defense component not described in the original paper. But this fix contains a second bug; modifying one more line of code reduces robust accuracy to below baseline levels. After we released the first version of our paper online, the authors introduced another change to the defense; by commenting out one line of code during attack we reduce the robust accuracy to 0% again.
翻译:Sabre是一种对抗样本防御方法,被IEEE S&P 2024会议接收。我们首先揭示了其评估中的重大缺陷,这些缺陷明确指向梯度掩蔽现象。随后我们证明该梯度掩蔽的根源在于原始评估代码中存在一处错误。通过修复原始代码库中的单行代码,我们将Sabre的鲁棒性准确率降至0%。针对此问题,原作者修改了防御方案并引入了一个原论文中未描述的新防御组件。但该修复包含第二个错误;再修改一行代码即可将鲁棒性准确率降至基线水平以下。在我们在线发布论文初版后,原作者对防御方案进行了另一处修改;通过在攻击过程中注释掉一行代码,我们再次将鲁棒性准确率降至0%。