Deep neural networks (DNNs) are susceptible to Universal Adversarial Perturbations (UAPs), which are instance agnostic perturbations that can deceive a target model across a wide range of samples. Unlike instance-specific adversarial examples, UAPs present a greater challenge as they must generalize across different samples and models. Generating UAPs typically requires access to numerous examples, which is a strong assumption in real-world tasks. In this paper, we propose a novel data-free method called Intrinsic UAP (IntriUAP), by exploiting the intrinsic vulnerabilities of deep models. We analyze a series of popular deep models composed of linear and nonlinear layers with a Lipschitz constant of 1, revealing that the vulnerability of these models is predominantly influenced by their linear components. Based on this observation, we leverage the ill-conditioned nature of the linear components by aligning the UAP with the right singular vectors corresponding to the maximum singular value of each linear layer. Remarkably, our method achieves highly competitive performance in attacking popular image classification deep models without using any image samples. We also evaluate the black-box attack performance of our method, showing that it matches the state-of-the-art baseline for data-free methods on models that conform to our theoretical framework. Beyond the data-free assumption, IntriUAP also operates under a weaker assumption, where the adversary only can access a few of the victim model's layers. Experiments demonstrate that the attack success rate decreases by only 4% when the adversary has access to just 50% of the linear layers in the victim model.
翻译:深度神经网络(DNN)易受通用对抗扰动(UAPs)的攻击,这是一种与具体实例无关的扰动,能够在广泛样本范围内欺骗目标模型。与针对特定实例的对抗样本不同,UAPs 因其必须泛化到不同样本和模型而构成更大挑战。生成 UAPs 通常需要访问大量样本,这在现实任务中是一个强假设。本文提出一种新颖的无数据方法,称为固有通用对抗扰动(IntriUAP),通过利用深度模型的固有脆弱性实现攻击。我们分析了一系列由线性层与非线性层组成、且 Lipschitz 常数为 1 的流行深度模型,发现这些模型的脆弱性主要受其线性组件影响。基于此观察,我们利用线性组件病态性的本质,通过将 UAP 与每个线性层最大奇异值对应的右奇异向量对齐来生成扰动。值得注意的是,我们的方法在不使用任何图像样本的情况下,对流行的图像分类深度模型实现了极具竞争力的攻击性能。我们还评估了本方法的黑盒攻击性能,结果表明,在符合我们理论框架的模型上,其性能与当前最先进的无数据方法基线相当。除无数据假设外,IntriUAP 还可在更弱的假设下运行,即攻击者仅能访问受害者模型的部分层。实验表明,当攻击者仅能访问受害者模型 50% 的线性层时,攻击成功率仅下降 4%。