Certifiably robust defenses against adversarial patches for image classifiers ensure correct prediction against any changes to a constrained neighborhood of pixels. PatchCleanser arXiv:2108.09135 [cs.CV], the state-of-the-art certified defense, uses a double-masking strategy for robust classification. The success of this strategy relies heavily on the model's invariance to image pixel masking. In this paper, we take a closer look at model training schemes to improve this invariance. Instead of using Random Cutout arXiv:1708.04552v2 [cs.CV] augmentations like PatchCleanser, we introduce the notion of worst-case masking, i.e., selecting masked images which maximize classification loss. However, finding worst-case masks requires an exhaustive search, which might be prohibitively expensive to do on-the-fly during training. To solve this problem, we propose a two-round greedy masking strategy (Greedy Cutout) which finds an approximate worst-case mask location with much less compute. We show that the models trained with our Greedy Cutout improves certified robust accuracy over Random Cutout in PatchCleanser across a range of datasets and architectures. Certified robust accuracy on ImageNet with a ViT-B16-224 model increases from 58.1\% to 62.3\% against a 3\% square patch applied anywhere on the image.
翻译:对抗性补丁对图像分类器的可认证鲁棒防御能够确保在像素受限邻域内的任何改变下,模型仍能做出正确预测。当前最先进的认证防御方法PatchCleanser(arXiv:2108.09135 [cs.CV])采用双重掩码策略实现鲁棒分类。该策略的成功高度依赖于模型对图像像素掩码的不变性。本文深入研究了模型训练方案以提升这种不变性。我们不采用PatchCleanser中使用的随机裁剪(Random Cutout, arXiv:1708.04552v2 [cs.CV])数据增强方法,而是引入了最坏情况掩码(worst-case masking)的概念,即选择能最大化分类损失的掩码图像。然而,寻找最坏情况掩码需要穷举搜索,这在训练过程中实时执行可能代价过高。为解决此问题,我们提出了一种两轮贪婪掩码策略(Greedy Cutout),能以更少的计算量近似找到最坏情况掩码位置。实验表明,采用Greedy Cutout训练的模型在PatchCleanser框架下,相较于Random Cutout,能在多种数据集和架构上提升认证鲁棒准确率。在ImageNet数据集上,使用ViT-B16-224模型时,针对图像任意位置施加的3%正方形补丁,认证鲁棒准确率从58.1%提升至62.3%。