PatchCensor: Patch Robustness Certification for Transformers via Exhaustive Testing

from arxiv, This paper has been accepted by ACM Transactions on Software Engineering and Methodology (TOSEM'23) in "Continuous Special Section: AI and SE." Please include TOSEM for any citations

Vision Transformer (ViT) is known to be highly nonlinear like other classical neural networks and could be easily fooled by both natural and adversarial patch perturbations. This limitation could pose a threat to the deployment of ViT in the real industrial environment, especially in safety-critical scenarios. In this work, we propose PatchCensor, aiming to certify the patch robustness of ViT by applying exhaustive testing. We try to provide a provable guarantee by considering the worst patch attack scenarios. Unlike empirical defenses against adversarial patches that may be adaptively breached, certified robust approaches can provide a certified accuracy against arbitrary attacks under certain conditions. However, existing robustness certifications are mostly based on robust training, which often requires substantial training efforts and the sacrifice of model performance on normal samples. To bridge the gap, PatchCensor seeks to improve the robustness of the whole system by detecting abnormal inputs instead of training a robust model and asking it to give reliable results for every input, which may inevitably compromise accuracy. Specifically, each input is tested by voting over multiple inferences with different mutated attention masks, where at least one inference is guaranteed to exclude the abnormal patch. This can be seen as complete-coverage testing, which could provide a statistical guarantee on inference at the test time. Our comprehensive evaluation demonstrates that PatchCensor is able to achieve high certified accuracy (e.g. 67.1% on ImageNet for 2%-pixel adversarial patches), significantly outperforming state-of-the-art techniques while achieving similar clean accuracy (81.8% on ImageNet). Meanwhile, our technique also supports flexible configurations to handle different adversarial patch sizes (up to 25%) by simply changing the masking strategy.

翻译：视觉Transformer（ViT）与其他经典神经网络一样，具有高度非线性的特点，容易被自然扰动和对抗性补丁攻击所欺骗。这一局限性可能对ViT在真实工业环境中的部署构成威胁，尤其是在安全关键场景中。为此，本文提出PatchCensor方法，旨在通过穷举测试来认证ViT的补丁鲁棒性。我们尝试通过考虑最恶劣的补丁攻击场景来提供可证明的保证。与可能被适应性突破的对抗性补丁经验性防御不同，认证鲁棒方法能在特定条件下针对任意攻击提供认证精度。然而，现有鲁棒性认证大多基于鲁棒训练，这往往需要大量的训练成本并以牺牲模型对正常样本的性能为代价。为弥合这一差距，PatchCensor通过检测异常输入而非训练鲁棒模型来提升系统整体鲁棒性——后者要求模型对每个输入给出可靠结果，不可避免地会影响准确率。具体而言，每个输入通过多个采用不同变异注意力掩码的推理结果进行投票测试，其中至少一次推理能保证排除异常补丁。这可视为完全覆盖测试，能在测试时为推理提供统计保证。综合评估表明，PatchCensor能够实现高认证精度（例如在ImageNet上针对2%像素对抗补丁达67.1%），显著优于现有技术，同时保持相近的干净样本准确率（ImageNet上达81.8%）。此外，我们的技术还支持灵活配置，仅需改变掩码策略即可处理不同尺寸的对抗补丁（最高可达25%）。