Vision Transformers (ViTs) achieve excellent performance in various tasks, but they are also vulnerable to adversarial attacks. Building robust ViTs is highly dependent on dedicated Adversarial Training (AT) strategies. However, current ViTs' adversarial training only employs well-established training approaches from convolutional neural network (CNN) training, where pre-training provides the basis for AT fine-tuning with the additional help of tailored data augmentations. In this paper, we take a closer look at the adversarial robustness of ViTs by providing a novel theoretical Mutual Information (MI) analysis in its autoencoder-based self-supervised pre-training. Specifically, we show that MI between the adversarial example and its latent representation in ViT-based autoencoders should be constrained by utilizing the MI bounds. Based on this finding, we propose a masked autoencoder-based pre-training method, MIMIR, that employs an MI penalty to facilitate the adversarial training of ViTs. Extensive experiments show that MIMIR outperforms state-of-the-art adversarially trained ViTs on benchmark datasets with higher natural and robust accuracy, indicating that ViTs can substantially benefit from exploiting MI. In addition, we consider two adaptive attacks by assuming that the adversary is aware of the MIMIR design, which further verifies the provided robustness.
翻译:视觉Transformer(ViT)在各种任务中表现出色,但它们同样容易受到对抗攻击的影响。构建鲁棒的ViT高度依赖于专门的对抗训练(AT)策略。然而,当前ViT的对抗训练仅采用了卷积神经网络(CNN)训练中成熟的训练方法,其中预训练为AT微调提供了基础,并辅以定制化的数据增强。本文通过在其基于自编码器的自监督预训练中提供一种新颖的理论互信息(MI)分析,更深入地研究了ViT的对抗鲁棒性。具体而言,我们证明了基于ViT的自编码器中对抗样本与其潜在表示之间的互信息应通过利用互信息界限加以约束。基于这一发现,我们提出了一种基于掩码自编码器的预训练方法MIMIR,该方法采用互信息惩罚来促进ViT的对抗训练。大量实验表明,MIMIR在基准数据集上优于当前最先进的对抗训练ViT,具有更高的自然精度和鲁棒精度,这表明ViT能够从利用互信息中显著受益。此外,我们通过假设攻击者知晓MIMIR设计,考虑了两种自适应攻击,进一步验证了所提供的鲁棒性。