Machine learning models are vulnerable to tiny adversarial input perturbations optimized to cause a very large output error. To measure this vulnerability, we need reliable methods that can find such adversarial perturbations. For image classification models, evaluation methodologies have emerged that have stood the test of time. However, we argue that in the area of semantic segmentation, a good approximation of the sensitivity to adversarial perturbations requires significantly more effort than what is currently considered satisfactory. To support this claim, we re-evaluate a number of well-known robust segmentation models in an extensive empirical study. We propose new attacks and combine them with the strongest attacks available in the literature. We also analyze the sensitivity of the models in fine detail. The results indicate that most of the state-of-the-art models have a dramatically larger sensitivity to adversarial perturbations than previously reported. We also demonstrate a size-bias: small objects are often more easily attacked, even if the large objects are robust, a phenomenon not revealed by current evaluation metrics. Our results also demonstrate that a diverse set of strong attacks is necessary, because different models are often vulnerable to different attacks.
翻译:机器学习模型容易受到微小的对抗性输入扰动影响,这些扰动经过优化可导致极大的输出误差。为衡量这种脆弱性,我们需要可靠的方法来发现此类对抗性扰动。对于图像分类模型,已经出现了经得起时间考验的评估方法。然而,我们认为在语义分割领域,要准确近似对抗性扰动的敏感性,需要付出比当前公认标准显著更多的努力。为支持这一观点,我们通过广泛的实证研究重新评估了多个知名鲁棒分割模型。我们提出了新的攻击方法,并将其与文献中最强的攻击方法相结合。我们还对模型的敏感性进行了精细分析。结果表明,大多数最先进的模型对对抗性扰动的敏感性远高于先前报道的水平。我们还证明了尺寸偏差现象:即使大物体具有鲁棒性,小物体也往往更容易受到攻击,这一现象未被当前评估指标所揭示。我们的结果同时表明,需要采用多样化的强攻击集合,因为不同模型通常容易受到不同攻击的影响。