A growing body of work has shown that deep neural networks are susceptible to adversarial examples. These take the form of small perturbations applied to the model's input which lead to incorrect predictions. Unfortunately, most literature focuses on visually imperceivable perturbations to be applied to digital images that often are, by design, impossible to be deployed to physical targets. We present Adversarial Scratches: a novel L0 black-box attack, which takes the form of scratches in images, and which possesses much greater deployability than other state-of-the-art attacks. Adversarial Scratches leverage B\'ezier Curves to reduce the dimension of the search space and possibly constrain the attack to a specific location. We test Adversarial Scratches in several scenarios, including a publicly available API and images of traffic signs. Results show that, often, our attack achieves higher fooling rate than other deployable state-of-the-art methods, while requiring significantly fewer queries and modifying very few pixels.
翻译:大量研究表明,深度神经网络易受对抗样本攻击。这类攻击表现为对模型输入施加的微小扰动,导致错误预测。然而,现有文献大多聚焦于对人类视觉不可见的扰动,这些扰动通常被设计应用于数字图像,却无法部署到物理目标上。我们提出对抗性划痕:一种新颖的L0黑盒攻击方法,该攻击以图像划痕形式呈现,比现有其他先进攻击具备更强的可部署性。对抗性划痕利用贝塞尔曲线降低搜索空间维度,并可将攻击约束在特定区域。我们在多个场景中测试了对抗性划痕,包括公开API及交通标志图像。结果表明,该攻击方法在显著减少查询次数且仅修改极少像素的条件下,常能达到比其他可部署先进方法更高的欺骗率。