The attention mechanism has been proven effective on various visual tasks in recent years. In the semantic segmentation task, the attention mechanism is applied in various methods, including the case of both Convolution Neural Networks (CNN) and Vision Transformer (ViT) as backbones. However, we observe that the attention mechanism is vulnerable to patch-based adversarial attacks. Through the analysis of the effective receptive field, we attribute it to the fact that the wide receptive field brought by global attention may lead to the spread of the adversarial patch. To address this issue, in this paper, we propose a Robust Attention Mechanism (RAM) to improve the robustness of the semantic segmentation model, which can notably relieve the vulnerability against patch-based attacks. Compared to the vallina attention mechanism, RAM introduces two novel modules called Max Attention Suppression and Random Attention Dropout, both of which aim to refine the attention matrix and limit the influence of a single adversarial patch on the semantic segmentation results of other positions. Extensive experiments demonstrate the effectiveness of our RAM to improve the robustness of semantic segmentation models against various patch-based attack methods under different attack settings.
翻译:注意力机制近年来在各种视觉任务中被证实有效。在语义分割任务中,注意力机制被广泛应用于多种方法,包括以卷积神经网络(CNN)和视觉Transformer(ViT)为主干的情况。然而,我们观察到注意力机制易受基于补丁的对抗性攻击。通过对有效感受野的分析,我们将其归因于全局注意力带来的宽感受野可能导致对抗性补丁的扩散。针对此问题,本文提出了一种鲁棒注意力机制(RAM)以提升语义分割模型的鲁棒性,能显著缓解对基于补丁攻击的脆弱性。与原始注意力机制相比,RAM引入了两个新颖模块:最大注意力抑制和随机注意力丢弃,两者均旨在精化注意力矩阵并限制单个对抗性补丁对其他位置语义分割结果的影响。大量实验证明了RAM在提升语义分割模型鲁棒性方面的有效性,可应对不同攻击场景下多种基于补丁的攻击方法。