Evaluating the robustness of a defense model is a challenging task in adversarial robustness research. Obfuscated gradients have previously been found to exist in many defense methods and cause a false signal of robustness. In this paper, we identify a more subtle situation called Imbalanced Gradients that can also cause overestimated adversarial robustness. The phenomenon of imbalanced gradients occurs when the gradient of one term of the margin loss dominates and pushes the attack towards to a suboptimal direction. To exploit imbalanced gradients, we formulate a Margin Decomposition (MD) attack that decomposes a margin loss into individual terms and then explores the attackability of these terms separately via a two-stage process. We also propose a multi-targeted and ensemble version of our MD attack. By investigating 24 defense models proposed since 2018, we find that 11 models are susceptible to a certain degree of imbalanced gradients and our MD attack can decrease their robustness evaluated by the best standalone baseline attack by more than 1%. We also provide an in-depth investigation on the likely causes of imbalanced gradients and effective countermeasures. Our code is available at https://github.com/HanxunH/MDAttack.
翻译:评估防御模型的鲁棒性是对抗鲁棒性研究中的一项挑战性任务。先前研究发现,许多防御方法中存在梯度混淆现象,这会导致虚假的鲁棒性信号。本文中,我们识别出一种更为微妙的情况——称为不平衡梯度——同样可能导致对抗鲁棒性被高估。当边际损失中某一项的梯度占据主导地位,从而将攻击推向次优方向时,就会产生梯度不平衡现象。为利用不平衡梯度,我们提出了一种边际分解(MD)攻击,该攻击将边际损失分解为单个项,然后通过两阶段过程分别探索这些项的可攻击性。我们还提出了多目标版本和集成版本的MD攻击。通过调查自2018年以来提出的24个防御模型,我们发现其中11个模型在某种程度上易受不平衡梯度影响,而我们的MD攻击可以使其鲁棒性(由最佳独立基线攻击评估)降低超过1%。此外,我们对不平衡梯度的可能原因及有效对策进行了深入研究。我们的代码可在https://github.com/HanxunH/MDAttack获取。