MIMIR: Masked Image Modeling for Mutual Information-based Adversarial Robustness

Vision Transformers (ViTs) achieve superior performance on various tasks compared to convolutional neural networks (CNNs), but ViTs are also vulnerable to adversarial attacks. Adversarial training is one of the most successful methods to build robust CNN models. Thus, recent works explored new methodologies for adversarial training of ViTs based on the differences between ViTs and CNNs, such as better training strategies, preventing attention from focusing on a single block, or discarding low-attention embeddings. However, these methods still follow the design of traditional supervised adversarial training, limiting the potential of adversarial training on ViTs. This paper proposes a novel defense method, MIMIR, which aims to build a different adversarial training methodology by utilizing Masked Image Modeling at pre-training. We create an autoencoder that accepts adversarial examples as input but takes the clean examples as the modeling target. Then, we create a mutual information (MI) penalty following the idea of the Information Bottleneck. Among the two information source inputs and corresponding adversarial perturbation, the perturbation information is eliminated due to the constraint of the modeling target. Next, we provide a theoretical analysis of MIMIR using the bounds of the MI penalty. We also design two adaptive attacks when the adversary is aware of the MIMIR defense and show that MIMIR still performs well. The experimental results show that MIMIR improves (natural and adversarial) accuracy on average by 4.19\% on CIFAR-10 and 5.52\% on ImageNet-1K, compared to baselines. On Tiny-ImageNet, we obtained improved natural accuracy of 2.99\% on average and comparable adversarial accuracy. Our code and trained models are publicly available\footnote{\url{https://anonymous.4open.science/r/MIMIR-5444/README.md}}.

翻译：视觉Transformer（ViTs）在多种任务上相较于卷积神经网络（CNNs）取得了更优性能，但ViTs同样容易受到对抗攻击。对抗训练是构建鲁棒CNN模型最成功的方法之一。因此，近期研究基于ViTs与CNNs的差异探索了新的ViT对抗训练方法，例如改进训练策略、防止注意力聚焦于单个块，或丢弃低注意力嵌入。然而，这些方法仍遵循传统监督对抗训练的设计思路，限制了对抗训练在ViTs上的潜力。本文提出了一种新颖的防御方法MIMIR，旨在通过利用预训练阶段的掩码图像建模构建不同的对抗训练方法。我们设计了一个自编码器，其输入为对抗样本，建模目标为干净样本。随后，基于信息瓶颈（Information Bottleneck）原理引入互信息（MI）惩罚项。在两个信息源输入与对应对抗扰动中，由于建模目标的约束，扰动信息被消除。我们通过MI惩罚项的界限对MIMIR进行了理论分析，并针对攻击者知晓MIMIR防御的场景设计了两种自适应攻击，结果表明MIMIR仍保持良好性能。实验结果显示，与基线方法相比，MIMIR在CIFAR-10上平均提升（自然与对抗）准确率4.19%，在ImageNet-1K上提升5.52%。在Tiny-ImageNet上，我们平均获得2.99%的自然准确率提升，对抗准确率与基线相当。我们的代码和训练模型已公开\footnote{\url{https://anonymous.4open.science/r/MIMIR-5444/README.md}}。