Universal Adversarial Perturbations (UAPs) are imperceptible, image-agnostic vectors that cause deep neural networks (DNNs) to misclassify inputs with high probability. In practical attack scenarios, adversarial perturbations may undergo transformations such as changes in pixel intensity, scaling, etc. before being added to DNN inputs. Existing methods do not create UAPs robust to these real-world transformations, thereby limiting their applicability in practical attack scenarios. In this work, we introduce and formulate UAPs robust against real-world transformations. We build an iterative algorithm using probabilistic robustness bounds and construct such UAPs robust to transformations generated by composing arbitrary sub-differentiable transformation functions. We perform an extensive evaluation on the popular CIFAR-10 and ILSVRC 2012 datasets measuring our UAPs' robustness under a wide range common, real-world transformations such as rotation, contrast changes, etc. We further show that by using a set of primitive transformations our method can generalize well to unseen transformations such as fog, JPEG compression, etc. Our results show that our method can generate UAPs up to 23% more robust than state-of-the-art baselines.
翻译:通用对抗扰动(UAPs)是一种不可察觉、与图像无关的向量,能导致深度神经网络(DNNs)以高概率错误分类输入。在实际攻击场景中,对抗扰动在被添加到DNN输入前可能经历诸如像素强度变化、缩放等变换。现有方法无法创建对这些真实世界变换具有鲁棒性的UAPs,从而限制了其在实际攻击场景中的适用性。本文首次提出并形式化了对真实世界变换具有鲁棒性的UAPs。我们利用概率鲁棒性边界构建迭代算法,生成了对由任意子可微变换函数组合生成的变换具有鲁棒性的UAPs。在流行的CIFAR-10和ILSVRC 2012数据集上进行大量评估,测量了我们UAPs在旋转、对比度变化等常见真实世界变换下的鲁棒性。进一步证明,通过使用一组基本变换,我们的方法能很好地泛化到未知变换,如雾霾、JPEG压缩等。结果表明,我们的方法生成的UAPs比现有最先进基线方法的鲁棒性最高提升23%。