Recent research has shown that Deep Neural Networks (DNNs) are highly vulnerable to adversarial samples, which are highly transferable and can be used to attack other unknown black-box models. To improve the transferability of adversarial samples, several feature-based adversarial attack methods have been proposed to disrupt neuron activation in middle layers. However, current state-of-the-art feature-based attack methods typically require additional computation costs for estimating the importance of neurons. To address this challenge, we propose a Singular Value Decomposition (SVD)-based feature-level attack method. Our approach is inspired by the discovery that eigenvectors associated with the larger singular values decomposed from the middle layer features exhibit superior generalization and attention properties. Specifically, we conduct the attack by retaining the decomposed Top-1 singular value-associated feature for computing the output logits, which are then combined with the original logits to optimize adversarial perturbations. Our extensive experimental results verify the effectiveness of our proposed method, which significantly enhances the transferability of adversarial samples against various baseline models and defense strategies.The source code of this study is available at \href{https://anonymous.4open.science/r/SVD-SSA-13BF/README.md}.
翻译:近期研究表明,深度神经网络(DNNs)对对抗样本高度敏感,这些样本具有强迁移性,可被用于攻击其他未知黑盒模型。为提升对抗样本的迁移性,已有多种基于特征的对抗攻击方法被提出,旨在干扰中间层的神经元激活。然而,当前最先进的基于特征攻击方法通常需要额外的计算成本来估计神经元重要性。针对这一挑战,我们提出了一种基于奇异值分解(SVD)的特征级攻击方法。该方法受以下发现启发:从中间层特征中分解出的较大奇异值对应的特征向量展现出更优的泛化性与注意力特性。具体而言,我们通过保留分解后的Top-1奇异值相关特征来计算输出对数,并将其与原始对数进行融合以优化对抗扰动。大量实验结果验证了我们提出的方法的有效性,该方法显著提升了对抗样本在各种基线模型与防御策略下的迁移性。本研究的源代码可在\href{https://anonymous.4open.science/r/SVD-SSA-13BF/README.md}获取。