Supervised learning-based adversarial attack detection methods rely on a large number of labeled data and suffer significant performance degradation when applying the trained model to new domains. In this paper, we propose a self-supervised representation learning framework for the adversarial attack detection task to address this drawback. Firstly, we map the pixels of augmented input images into an embedding space. Then, we employ the prototype-wise contrastive estimation loss to cluster prototypes as latent variables. Additionally, drawing inspiration from the concept of memory banks, we introduce a discrimination bank to distinguish and learn representations for each individual instance that shares the same or a similar prototype, establishing a connection between instances and their associated prototypes. We propose a parallel axial-attention (PAA)-based encoder to facilitate the training process by parallel training over height- and width-axis of attention maps. Experimental results show that, compared to various benchmark self-supervised vision learning models and supervised adversarial attack detection methods, the proposed model achieves state-of-the-art performance on the adversarial attack detection task across a wide range of images.
翻译:基于监督学习的对抗攻击检测方法依赖于大量标注数据,且在将训练模型应用于新领域时性能显著下降。本文针对这一缺陷,提出了一种用于对抗攻击检测任务的自监督表示学习框架。首先,我们将增强输入图像的像素映射到嵌入空间。随后,采用原型级对比估计损失将原型聚类为隐变量。此外,受记忆库概念的启发,我们引入了一个判别库,用于区分和学习共享相同或相似原型的每个独立实例的表示,从而建立实例与其关联原型之间的连接。我们提出了一种基于并行轴向注意力(PAA)的编码器,通过并行训练注意力图的高度轴和宽度轴来促进训练过程。实验结果表明,与多种基准自监督视觉学习模型及监督式对抗攻击检测方法相比,所提模型在广泛图像范围的对抗攻击检测任务上取得了最先进的性能。