Improving Transferability of Adversarial Examples via Bayesian Attacks

This paper presents a substantial extension of our work published at ICLR. Our ICLR work advocated for enhancing transferability in adversarial examples by incorporating a Bayesian formulation into model parameters, which effectively emulates the ensemble of infinitely many deep neural networks, while, in this paper, we introduce a novel extension by incorporating the Bayesian formulation into the model input as well, enabling the joint diversification of both the model input and model parameters. Our empirical findings demonstrate that: 1) the combination of Bayesian formulations for both the model input and model parameters yields significant improvements in transferability; 2) by introducing advanced approximations of the posterior distribution over the model input, adversarial transferability achieves further enhancement, surpassing all state-of-the-arts when attacking without model fine-tuning. Moreover, we propose a principled approach to fine-tune model parameters in such an extended Bayesian formulation. The derived optimization objective inherently encourages flat minima in the parameter space and input space. Extensive experiments demonstrate that our method achieves a new state-of-the-art on transfer-based attacks, improving the average success rate on ImageNet and CIFAR-10 by 19.14% and 2.08%, respectively, when comparing with our ICLR basic Bayesian method. We will make our code publicly available.

翻译：本文是对我们发表在ICLR上的工作的实质性扩展。在ICLR工作中，我们主张通过将贝叶斯公式引入模型参数来增强对抗样本的可迁移性，有效模拟了无限多个深度神经网络的集成；而本文进一步提出创新扩展，将贝叶斯公式同时引入模型输入，实现了模型输入与模型参数的联合多样化。实验发现：1）模型输入与模型参数的贝叶斯公式组合能显著提升可迁移性；2）通过引入对模型输入后验分布的高级近似，对抗样本的可迁移性得到进一步增强，在不进行模型微调的攻击场景下超越了所有现有最优方法。此外，我们提出了一种系统化的方法，在扩展的贝叶斯公式下对模型参数进行微调。所推导的优化目标天然地鼓励参数空间与输入空间的平坦最小值。大量实验表明，我们的方法在基于迁移的攻击中达到了新的最优水平，与我们的ICLR基础贝叶斯方法相比，在ImageNet和CIFAR-10上的平均攻击成功率分别提升了19.14%和2.08%。我们将公开代码。

相关内容