Recent research has shown that Deep Neural Networks (DNNs) are highly vulnerable to adversarial samples, which are highly transferable and can be used to attack other unknown black-box models. To improve the transferability of adversarial samples, several feature-based adversarial attack methods have been proposed to disrupt neuron activation in the middle layers. However, current state-of-the-art feature-based attack methods typically require additional computation costs for estimating the importance of neurons. To address this challenge, we propose a Singular Value Decomposition (SVD)-based feature-level attack method. Our approach is inspired by the discovery that eigenvectors associated with the larger singular values decomposed from the middle layer features exhibit superior generalization and attention properties. Specifically, we conduct the attack by retaining the decomposed Top-1 singular value-associated feature for computing the output logits, which are then combined with the original logits to optimize adversarial examples. Our extensive experimental results verify the effectiveness of our proposed method, which can be easily integrated into various baselines to significantly enhance the transferability of adversarial samples for disturbing normally trained CNNs and advanced defense strategies. The source code of this study is available at https://github.com/WJJLL/SVD-SSA
翻译:近期研究表明深度神经网络极易受到对抗样本攻击,此类样本具有强迁移性,可用于攻击其他未知黑盒模型。为提升对抗样本的迁移性,现有多种基于特征的攻击方法通过破坏中间层神经元激活来实现攻击。然而当前最先进的基于特征攻击方法通常需要额外计算代价来评估神经元重要性。针对该挑战,我们提出基于奇异值分解(SVD)的特征级攻击方法。该方法受以下发现启发:从中间层特征分解后与较大奇异值关联的特征向量具有更优的泛化性和注意力特性。具体而言,我们通过保留分解后的Top-1奇异值相关特征来计算输出对数值,并将其与原始对数值融合以优化对抗样本。大量实验验证了本方法的有效性,该方法可便捷集成至各类基线方法中,显著提升对抗样本对常规训练CNN及先进防御策略的干扰迁移性。本研究的源代码发布于 https://github.com/WJJLL/SVD-SSA