State-of-the-art deep neural networks have proven to be highly powerful in a broad range of tasks, including semantic image segmentation. However, these networks are vulnerable against adversarial attacks, i.e., non-perceptible perturbations added to the input image causing incorrect predictions, which is hazardous in safety-critical applications like automated driving. Adversarial examples and defense strategies are well studied for the image classification task, while there has been limited research in the context of semantic segmentation. First works however show that the segmentation outcome can be severely distorted by adversarial attacks. In this work, we introduce an uncertainty-based method for the detection of adversarial attacks in semantic segmentation. We observe that uncertainty as for example captured by the entropy of the output distribution behaves differently on clean and perturbed images using this property to distinguish between the two cases. Our method works in a light-weight and post-processing manner, i.e., we do not modify the model or need knowledge of the process used for generating adversarial examples. In a thorough empirical analysis, we demonstrate the ability of our approach to detect perturbed images across multiple types of adversarial attacks.
翻译:最先进的深度神经网络已被证明在包括语义图像分割在内的广泛任务中具有强大能力。然而,这些网络容易受到对抗攻击的影响,即向输入图像添加不可察觉的扰动导致错误预测,这在自动驾驶等安全关键应用中具有危险性。对抗样本和防御策略在图像分类任务中已得到充分研究,但在语义分割背景下的研究仍十分有限。然而,初步研究表明对抗攻击可能严重扭曲分割结果。本研究提出一种基于不确定性的方法,用于检测语义分割中的对抗攻击。我们观察到,输出分布熵所捕获的不确定性等在干净图像和扰动图像上表现不同,利用这一特性可区分两种情况。我们的方法以轻量级和后处理方式运行,即我们无需修改模型或了解生成对抗样本的过程。通过彻底的实证分析,我们证明了该方法在多种对抗攻击类型中检测扰动图像的能力。