State-of-the-art deep neural networks have been shown to be extremely powerful in a variety of perceptual tasks like semantic segmentation. However, these networks are vulnerable to adversarial perturbations of the input which are imperceptible for humans but lead to incorrect predictions. Treating image segmentation as a sum of pixel-wise classifications, adversarial attacks developed for classification models were shown to be applicable to segmentation models as well. In this work, we present simple uncertainty-based weighting schemes for the loss functions of such attacks that (i) put higher weights on pixel classifications which can more easily perturbed and (ii) zero-out the pixel-wise losses corresponding to those pixels that are already confidently misclassified. The weighting schemes can be easily integrated into the loss function of a range of well-known adversarial attackers with minimal additional computational overhead, but lead to significant improved perturbation performance, as we demonstrate in our empirical analysis on several datasets and models.
翻译:最先进的深度神经网络在语义分割等感知任务中展现出极强的能力。然而,这些网络易受人类无法感知但会导致错误预测的输入扰动(即对抗性扰动)的影响。将图像分割视为逐像素分类的总和,针对分类模型开发的对抗攻击同样适用于分割模型。本文提出基于不确定性的简单加权方案,用于此类攻击的损失函数,该方案具有以下特点:(1)对更易被扰动的像素分类赋予更高权重;(2)将已可靠误分类像素对应的逐像素损失归零。该加权方案可轻松集成至多种主流对抗攻击的损失函数中,仅需极少的额外计算开销,但能显著提升扰动性能——我们在多个数据集和模型上的实证分析验证了这一效果。