Adversarial patch attacks are an emerging security threat for real world deep learning applications. We present Demasked Smoothing, the first approach (up to our knowledge) to certify the robustness of semantic segmentation models against this threat model. Previous work on certifiably defending against patch attacks has mostly focused on image classification task and often required changes in the model architecture and additional training which is undesirable and computationally expensive. In Demasked Smoothing, any segmentation model can be applied without particular training, fine-tuning, or restriction of the architecture. Using different masking strategies, Demasked Smoothing can be applied both for certified detection and certified recovery. In extensive experiments we show that Demasked Smoothing can on average certify 64% of the pixel predictions for a 1% patch in the detection task and 48% against a 0.5% patch for the recovery task on the ADE20K dataset.
翻译:对抗性补丁攻击是现实世界深度学习应用中新兴的安全威胁。我们提出Demasked Smoothing,这是(据我们所知)首个能够针对该威胁模型认证语义分割模型鲁棒性的方法。此前关于可认证防御补丁攻击的研究主要集中在图像分类任务上,且通常需要修改模型架构并进行额外训练,这既不可取也计算成本高昂。在Demasked Smoothing中,任何分割模型无需特定训练、微调或架构限制即可应用。通过使用不同的掩码策略,Demasked Smoothing既可用于认证检测,也可用于认证恢复。大量实验表明,在ADE20K数据集上,Demasked Smoothing在检测任务中平均可认证64%的像素预测(针对1%补丁),在恢复任务中平均可认证48%的像素预测(针对0.5%补丁)。