MIMIR: Masked Image Modeling for Mutual Information-based Adversarial Robustness

Vision Transformers (ViTs) achieve superior performance on various tasks compared to convolutional neural networks (CNNs), but ViTs are also vulnerable to adversarial attacks. Adversarial training is one of the most successful methods to build robust CNN models. Thus, recent works explored new methodologies for adversarial training of ViTs based on the differences between ViTs and CNNs, such as better training strategies, preventing attention from focusing on a single block, or discarding low-attention embeddings. However, these methods still follow the design of traditional supervised adversarial training, limiting the potential of adversarial training on ViTs. This paper proposes a novel defense method, MIMIR, which aims to build a different adversarial training methodology by utilizing Masked Image Modeling at pre-training. We create an autoencoder that accepts adversarial examples as input but takes the clean examples as the modeling target. Then, we create a mutual information (MI) penalty following the idea of the Information Bottleneck. Among the two information source inputs and corresponding adversarial perturbation, the perturbation information is eliminated due to the constraint of the modeling target. Next, we provide a theoretical analysis of MIMIR using the bounds of the MI penalty. We also design two adaptive attacks when the adversary is aware of the MIMIR defense and show that MIMIR still performs well. The experimental results show that MIMIR improves (natural and adversarial) accuracy on average by 4.19% on CIFAR-10 and 5.52% on ImageNet-1K, compared to baselines. On Tiny-ImageNet, we obtained improved natural accuracy of 2.99\% on average and comparable adversarial accuracy. Our code and trained models are publicly available https://github.com/xiaoyunxxy/MIMIR.

翻译：视觉Transformer（ViTs）在各类任务上相比卷积神经网络（CNNs）展现出更优性能，但ViTs同样易受对抗攻击影响。对抗训练是构建鲁棒CNN模型最成功的方法之一，近期研究基于ViTs与CNNs的差异探索了针对ViTs的新型对抗训练方法，例如优化训练策略、避免注意力聚焦于单个区块，或丢弃低注意力嵌入等。然而，这些方法仍沿用传统监督式对抗训练的设计范式，限制了对抗训练在ViTs上的潜力。本文提出一种新型防御方法MIMIR，旨在通过在预训练阶段利用掩码图像建模构建不同的对抗训练方法论。我们构建自编码器，以对抗样本为输入，但以干净样本为建模目标。随后，遵循信息瓶颈思想引入互信息惩罚项。在两个信息源（输入与对应对抗扰动）中，由于建模目标的约束，扰动信息被消除。接着，我们通过互信息惩罚的边界对MIMIR进行理论分析。针对攻击者知晓MIMIR防御机制的情况，我们设计了两种自适应攻击，实验表明MIMIR仍表现良好。实验结果显示，与基线相比，MIMIR在CIFAR-10和ImageNet-1K上平均分别提升（自然与对抗）准确率4.19%和5.52%。在Tiny-ImageNet上，我们获得平均2.99%的自然准确率提升，同时保持可比的对抗准确率。我们的代码和预训练模型已公开在https://github.com/xiaoyunxxy/MIMIR。