Adversarial attacks on deep-learning models have been receiving increased attention in recent years. Work in this area has mostly focused on gradient-based techniques, so-called white-box attacks, wherein the attacker has access to the targeted model's internal parameters; such an assumption is usually unrealistic in the real world. Some attacks additionally use the entire pixel space to fool a given model, which is neither practical nor physical (i.e., real-world). On the contrary, we propose herein a gradient-free method that uses the learned image manifold of a pretrained generative adversarial network (GAN) to generate naturalistic physical adversarial patches for object detectors. We show that our proposed method works both digitally and physically.
翻译:近年来,针对深度学习模型的对抗攻击受到越来越多的关注。该领域的研究主要集中于基于梯度的技术,即所谓的白盒攻击,其中攻击者可以访问目标模型的内部参数;然而,这种假设在实际场景中通常不切实际。部分攻击还利用整个像素空间来欺骗给定模型,这种方式既缺乏实用性,也无法在物理世界(即现实世界)中实现。与此相反,我们在此提出一种无梯度方法,该方法利用预训练生成对抗网络(GAN)学习到的图像流形,为目标检测器生成具有自然主义外观的物理对抗补丁。实验表明,我们提出的方法在数字领域和物理领域均有效。