Despite the remarkable success of Vision Transformers (ViTs) across a wide range of vision tasks, recent studies have revealed that they remain vulnerable to adversarial examples, much like Convolutional Neural Networks (CNNs). A common empirical defense strategy is adversarial training, yet the theoretical underpinnings of its robustness in ViTs remain largely unexplored. In this work, we present the first theoretical analysis of adversarial training under simplified ViT architectures. We show that, when trained under a signal-to-noise ratio that satisfies a certain condition and within a moderate perturbation budget, adversarial training enables ViTs to achieve nearly zero robust training loss and robust generalization error under certain regimes. Remarkably, this leads to strong generalization even in the presence of overfitting, a phenomenon known as \emph{benign overfitting}, previously only observed in CNNs (with adversarial training). Experiments on both synthetic and real-world datasets further validate our theoretical findings.
翻译:尽管视觉Transformer(ViT)在广泛的视觉任务中取得了显著成功,但近期研究表明,与卷积神经网络(CNN)类似,它们仍然容易受到对抗样本的攻击。一种常见的经验防御策略是对抗训练,然而其对于ViT鲁棒性的理论基础在很大程度上仍未得到探索。本文首次对简化ViT架构下的对抗训练进行了理论分析。我们证明,当时信噪比满足特定条件且扰动幅度适中时,对抗训练能使ViT在特定条件下实现近乎为零的鲁棒训练损失和鲁棒泛化误差。值得注意的是,即使在存在过拟合的情况下,这也能带来强大的泛化能力——这种被称为“良性过拟合”的现象此前仅在(采用对抗训练的)CNN中被观察到。在合成数据集和真实数据集上的实验进一步验证了我们的理论发现。