利用深度模型固有脆弱性的无数据通用攻击 (Data-Free Universal Attack by Exploiting the Intrinsic Vulnerability of Deep Models)

Deep neural networks (DNNs) are susceptible to Universal Adversarial Perturbations (UAPs), which are instance agnostic perturbations that can deceive a target model across a wide range of samples. Unlike instance-specific adversarial examples, UAPs present a greater challenge as they must generalize across different samples and models. Generating UAPs typically requires access to numerous examples, which is a strong assumption in real-world tasks. In this paper, we propose a novel data-free method called Intrinsic UAP (IntriUAP), by exploiting the intrinsic vulnerabilities of deep models. We analyze a series of popular deep models composed of linear and nonlinear layers with a Lipschitz constant of 1, revealing that the vulnerability of these models is predominantly influenced by their linear components. Based on this observation, we leverage the ill-conditioned nature of the linear components by aligning the UAP with the right singular vectors corresponding to the maximum singular value of each linear layer. Remarkably, our method achieves highly competitive performance in attacking popular image classification deep models without using any image samples. We also evaluate the black-box attack performance of our method, showing that it matches the state-of-the-art baseline for data-free methods on models that conform to our theoretical framework. Beyond the data-free assumption, IntriUAP also operates under a weaker assumption, where the adversary only can access a few of the victim model's layers. Experiments demonstrate that the attack success rate decreases by only 4% when the adversary has access to just 50% of the linear layers in the victim model.

翻译：深度神经网络（DNN）易受通用对抗扰动（UAPs）的攻击，这是一种与具体实例无关的扰动，能够在广泛样本范围内欺骗目标模型。与针对特定实例的对抗样本不同，UAPs 因其必须泛化到不同样本和模型而构成更大挑战。生成 UAPs 通常需要访问大量样本，这在现实任务中是一个强假设。本文提出一种新颖的无数据方法，称为固有通用对抗扰动（IntriUAP），通过利用深度模型的固有脆弱性实现攻击。我们分析了一系列由线性层与非线性层组成、且 Lipschitz 常数为 1 的流行深度模型，发现这些模型的脆弱性主要受其线性组件影响。基于此观察，我们利用线性组件病态性的本质，通过将 UAP 与每个线性层最大奇异值对应的右奇异向量对齐来生成扰动。值得注意的是，我们的方法在不使用任何图像样本的情况下，对流行的图像分类深度模型实现了极具竞争力的攻击性能。我们还评估了本方法的黑盒攻击性能，结果表明，在符合我们理论框架的模型上，其性能与当前最先进的无数据方法基线相当。除无数据假设外，IntriUAP 还可在更弱的假设下运行，即攻击者仅能访问受害者模型的部分层。实验表明，当攻击者仅能访问受害者模型 50% 的线性层时，攻击成功率仅下降 4%。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日