The transferability of adversarial examples across deep neural networks (DNNs) is the crux of many black-box attacks. Many prior efforts have been devoted to improving the transferability via increasing the diversity in inputs of some substitute models. In this paper, by contrast, we opt for the diversity in substitute models and advocate to attack a Bayesian model for achieving desirable transferability. Deriving from the Bayesian formulation, we develop a principled strategy for possible finetuning, which can be combined with many off-the-shelf Gaussian posterior approximations over DNN parameters. Extensive experiments have been conducted to verify the effectiveness of our method, on common benchmark datasets, and the results demonstrate that our method outperforms recent state-of-the-arts by large margins (roughly 19% absolute increase in average attack success rate on ImageNet), and, by combining with these recent methods, further performance gain can be obtained. Our code: https://github.com/qizhangli/MoreBayesian-attack.
翻译:对抗样本在不同深度神经网络(DNN)之间的可迁移性是多类黑盒攻击的关键所在。先前众多研究致力于通过增加某些替代模型输入数据的多样性来提升可迁移性。与此相反,本文选择在替代模型层面引入多样性,并主张通过攻击贝叶斯模型来实现理想的可迁移性。基于贝叶斯公式,我们提出了一种可用于潜在微调的原则性策略,该策略可与多种现成的DNN参数高斯后验近似方法相结合。我们在通用基准数据集上进行了大量实验来验证方法的有效性,结果表明,我们的方法大幅优于当前最先进的技术(在ImageNet上平均攻击成功率绝对提升约19%),且与这些最新方法结合使用时,可进一步提升性能。我们的代码地址:https://github.com/qizhangli/MoreBayesian-attack。