Improving the Transferability of Adversarial Examples with Arbitrary Style Transfer

Deep neural networks are vulnerable to adversarial examples crafted by applying human-imperceptible perturbations on clean inputs. Although many attack methods can achieve high success rates in the white-box setting, they also exhibit weak transferability in the black-box setting. Recently, various methods have been proposed to improve adversarial transferability, in which the input transformation is one of the most effective methods. In this work, we notice that existing input transformation-based works mainly adopt the transformed data in the same domain for augmentation. Inspired by domain generalization, we aim to further improve the transferability using the data augmented from different domains. Specifically, a style transfer network can alter the distribution of low-level visual features in an image while preserving semantic content for humans. Hence, we propose a novel attack method named Style Transfer Method (STM) that utilizes a proposed arbitrary style transfer network to transform the images into different domains. To avoid inconsistent semantic information of stylized images for the classification network, we fine-tune the style transfer network and mix up the generated images added by random noise with the original images to maintain semantic consistency and boost input diversity. Extensive experimental results on the ImageNet-compatible dataset show that our proposed method can significantly improve the adversarial transferability on either normally trained models or adversarially trained models than state-of-the-art input transformation-based attacks. Code is available at: https://github.com/Zhijin-Ge/STM.

翻译：深度神经网络易受对抗样本攻击，此类样本通过在干净输入上施加人眼不可察觉的扰动生成。尽管许多攻击方法在白盒场景下能达到较高成功率，但在黑盒场景下其迁移性较弱。近年来，研究者提出了多种方法提升对抗样本的迁移性，其中输入变换方法最为有效。本文注意到现有基于输入变换的工作主要采用同领域变换数据进行数据增强。受领域泛化启发，我们旨在通过跨领域数据增强进一步提升迁移性。具体而言，风格迁移网络能在保留人类可理解的语义内容的同时，改变图像中低级视觉特征的分布。为此，我们提出新型攻击方法——风格迁移方法（STM），该方法利用所提出的任意风格迁移网络将图像转换至不同领域。为避免风格化图像对分类网络产生语义信息不一致的问题，我们微调风格迁移网络，并将添加随机噪声的生成图像与原始图像混合，以保持语义一致性并提升输入多样性。在ImageNet兼容数据集上的大量实验表明，与现有基于输入变换的最优攻击方法相比，本文方法能显著提升对抗样本在正常训练模型及对抗训练模型上的迁移性。代码开源于：https://github.com/Zhijin-Ge/STM。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日