Deep neural networks (DNNs) are highly susceptible to adversarial examples--subtle perturbations applied to inputs that are often imperceptible to humans yet lead to incorrect model predictions. In black-box scenarios, however, existing adversarial examples exhibit limited transferability and struggle to effectively compromise multiple unseen DNN models. Previous strategies enhance the cross-model generalization of adversarial examples by introducing versatility into adversarial perturbations, thereby improving transferability. However, further refining perturbation versatility often demands intricate algorithm development and substantial computation consumption. In this work, we propose an input transpose method that requires almost no additional labor and computation costs but can significantly improve the transferability of existing adversarial strategies. Even without adding adversarial perturbations, our method demonstrates considerable effectiveness in cross-model attacks. Our exploration finds that on specific datasets, a mere $1^\circ$ left or right rotation might be sufficient for most adversarial examples to deceive unseen models. Our further analysis suggests that this transferability improvement triggered by rotating only $1^\circ$ may stem from visible pattern shifts in the DNN's low-level feature maps. Moreover, this transferability exhibits optimal angles that, when identified under unrestricted query conditions, could potentially yield even greater performance.
翻译:深度神经网络(DNN)极易受到对抗样本的影响——这些施加于输入上的细微扰动通常人类难以察觉,却会导致模型做出错误预测。然而,在黑盒场景下,现有的对抗样本表现出有限的可迁移性,难以有效攻击多个未见过的DNN模型。先前的研究通过增强对抗扰动的通用性来提升对抗样本的跨模型泛化能力,从而提高其可迁移性。然而,进一步优化扰动的通用性通常需要复杂的算法开发和大量的计算消耗。在本研究中,我们提出了一种输入转置方法,该方法几乎不需要额外的人工和计算成本,却能显著提升现有对抗策略的可迁移性。即使不添加对抗扰动,我们的方法在跨模型攻击中也表现出显著的有效性。我们的研究发现,在特定数据集上,仅需向左或向右旋转$1^\circ$,就足以使大多数对抗样本成功欺骗未见过的模型。进一步的分析表明,这种仅通过旋转$1^\circ$引发的可迁移性提升,可能源于DNN底层特征图中可见的模式偏移。此外,这种可迁移性存在最优旋转角度,若在无限制查询条件下识别出这些角度,可能获得更优的性能表现。