To protect deep neural networks (DNNs) from adversarial attacks, adversarial training (AT) is developed by incorporating adversarial examples (AEs) into model training. Recent studies show that adversarial attacks disproportionately impact the patterns within the phase of the sample's frequency spectrum -- typically containing crucial semantic information -- more than those in the amplitude, resulting in the model's erroneous categorization of AEs. We find that, by mixing the amplitude of training samples' frequency spectrum with those of distractor images for AT, the model can be guided to focus on phase patterns unaffected by adversarial perturbations. As a result, the model's robustness can be improved. Unfortunately, it is still challenging to select appropriate distractor images, which should mix the amplitude without affecting the phase patterns. To this end, in this paper, we propose an optimized Adversarial Amplitude Generator (AAG) to achieve a better tradeoff between improving the model's robustness and retaining phase patterns. Based on this generator, together with an efficient AE production procedure, we design a new Dual Adversarial Training (DAT) strategy. Experiments on various datasets show that our proposed DAT leads to significantly improved robustness against diverse adversarial attacks.
翻译:为保护深度神经网络(DNNs)免受对抗攻击,对抗训练(AT)通过将对抗样本(AEs)纳入模型训练而发展起来。近期研究表明,对抗攻击对样本频谱相位中的模式(通常包含关键语义信息)的影响远大于对振幅的影响,导致模型错误分类AEs。我们发现,通过将训练样本频谱的振幅与用于AT的干扰图像振幅混合,可以引导模型关注不受对抗扰动影响的相位模式,从而提升模型的鲁棒性。然而,选择合适的干扰图像仍具挑战性,这些图像应能在不影响相位模式的前提下混合振幅。为此,本文提出一种优化的对抗振幅生成器(AAG),以在提升模型鲁棒性与保留相位模式之间实现更优权衡。基于该生成器及高效的AE生成流程,我们设计了一种新的双重对抗训练(DAT)策略。在不同数据集上的实验表明,我们提出的DAT能显著提升模型针对多种对抗攻击的鲁棒性。