Adversarial defenses are naturally evaluated on their ability to tolerate adversarial attacks. To test defenses, diverse adversarial attacks are crafted, that are usually described in terms of their evading capability and the L0, L1, L2, and Linf norms. We question if the evading capability and L-norms are the most effective information to claim that defenses have been tested against a representative attack set. To this extent, we select image quality metrics from the state of the art and search correlations between image perturbation and detectability. We observe that computing L-norms alone is rarely the preferable solution. We observe a strong correlation between the identified metrics computed on an adversarial image and the output of a detector on such an image, to the extent that they can predict the response of a detector with approximately 0.94 accuracy. Further, we observe that metrics can classify attacks based on similar perturbations and similar detectability. This suggests a possible review of the approach to evaluate detectors, where additional metrics are included to assure that a representative attack dataset is selected.
翻译:对抗防御自然以其抵御对抗攻击的能力来评估。为测试防御,研究人员设计了多种对抗攻击,通常通过其逃避能力以及L0、L1、L2和L∞范数来描述。我们质疑逃避能力和L范数是否为声称防御已通过代表性攻击集测试的最有效信息。为此,我们选取了当前最先进的图像质量度量,并探究图像扰动与可检测性之间的相关性。我们发现仅计算L范数很少是优选的解决方案。我们观察到,在对抗图像上计算的选定度量与该图像在检测器上的输出之间存在强相关性,以至于它们能以约0.94的准确率预测检测器的响应。此外,我们还发现这些度量能够根据相似的扰动和相似的可检测性对攻击进行分类。这提示我们可能需要重新审视检测器的评估方法,即引入额外的度量以确保选择具有代表性的攻击数据集。