Although fast adversarial training provides an efficient approach for building robust networks, it may suffer from a serious problem known as catastrophic overfitting (CO), where the multi-step robust accuracy suddenly collapses to zero. In this paper, we for the first time decouple the FGSM examples into data-information and self-information, which reveals an interesting phenomenon called "self-fitting". Self-fitting, i.e., DNNs learn the self-information embedded in single-step perturbations, naturally leads to the occurrence of CO. When self-fitting occurs, the network experiences an obvious "channel differentiation" phenomenon that some convolution channels accounting for recognizing self-information become dominant, while others for data-information are suppressed. In this way, the network learns to only recognize images with sufficient self-information and loses generalization ability to other types of data. Based on self-fitting, we provide new insight into the existing methods to mitigate CO and extend CO to multi-step adversarial training. Our findings reveal a self-learning mechanism in adversarial training and open up new perspectives for suppressing different kinds of information to mitigate CO.
翻译:尽管快速对抗训练为构建鲁棒网络提供了一种高效方法,但它可能面临称为灾难性过拟合(CO)的严重问题,即多步鲁棒准确率突然降为零。本文首次将FGSM示例解耦为数据信息和自信息,揭示了称为"自拟合"的有趣现象。自拟合——即深度神经网络学习嵌入单步扰动中的自信息——自然地导致了CO的发生。当自拟合发生时,网络会表现出明显的"通道分化"现象:部分负责识别自信息的卷积通道变得占主导地位,而负责数据信息的通道则受到抑制。这使得网络仅能识别具有充分自信息的图像,丧失了对其他类型数据的泛化能力。基于自拟合,我们为现有缓解CO的方法提供了新见解,并将CO扩展至多步对抗训练。我们的发现揭示了对抗训练中的自学习机制,为通过抑制不同类型信息来缓解CO开辟了新视角。