In this paper, we use the interaction inside adversarial perturbations to explain and boost the adversarial transferability. We discover and prove the negative correlation between the adversarial transferability and the interaction inside adversarial perturbations. The negative correlation is further verified through different DNNs with various inputs. Moreover, this negative correlation can be regarded as a unified perspective to understand current transferability-boosting methods. To this end, we prove that some classic methods of enhancing the transferability essentially decease interactions inside adversarial perturbations. Based on this, we propose to directly penalize interactions during the attacking process, which significantly improves the adversarial transferability.
翻译:本文利用对抗扰动内部的相互作用来解释并增强对抗迁移性。我们发现并证明了对抗迁移性与对抗扰动内部相互作用之间的负相关关系,并通过不同深度神经网络及多种输入进一步验证了这一负相关性。此外,该负相关关系可作为理解当前迁移性增强方法的统一视角。为此,我们证明了一些经典增强迁移性的方法实质上降低了对抗扰动内部的相互作用。基于此,我们提出在攻击过程中直接惩罚相互作用,从而显著提升了对抗迁移性。