The vulnerability of deep neural networks to adversarial samples has been a major impediment to their broad applications, despite their success in various fields. Recently, some works suggested that adversarially-trained models emphasize the importance of low-frequency information to achieve higher robustness. While several attempts have been made to leverage this frequency characteristic, they have all faced the issue that applying low-pass filters directly to input images leads to irreversible loss of discriminative information and poor generalizability to datasets with distinct frequency features. This paper presents a plug-and-play module called the Frequency Preference Control Module that adaptively reconfigures the low- and high-frequency components of intermediate feature representations, providing better utilization of frequency in robust learning. Empirical studies show that our proposed module can be easily incorporated into any adversarial training framework, further improving model robustness across different architectures and datasets. Additionally, experiments were conducted to examine how the frequency bias of robust models impacts the adversarial training process and its final robustness, revealing interesting insights.
翻译:深度神经网络在多个领域取得成功的同时,其对对抗样本的脆弱性成为其广泛应用的主要障碍。近期研究表明,经过对抗训练的模型更强调低频信息的重要性以实现更高鲁棒性。尽管已有多种尝试利用这一频率特性,但这些方法均面临直接对输入图像应用低通滤波会导致鉴别性信息不可逆损失、以及对具有不同频率特征的数据集泛化能力差的问题。本文提出了一种即插即用的模块——频率偏好控制模块,该模块通过自适应重配置中间特征表征的低频与高频分量,在鲁棒学习中更有效地利用频率信息。实验表明,该模块可轻松集成至任意对抗训练框架中,在不同架构和数据集上进一步提升模型鲁棒性。此外,本文通过实验探究了鲁棒模型的频率偏差如何影响对抗训练过程及其最终鲁棒性,揭示了有趣的洞见。