In this paper, we present a novel approach for generating naturalistic adversarial patches without using GANs. Our proposed approach generates a Dynamic Adversarial Patch (DAP) that looks naturalistic while maintaining high attack efficiency and robustness in real-world scenarios. To achieve this, we redefine the optimization problem by introducing a new objective function, where a similarity metric is used to construct a similarity loss. This guides the patch to follow predefined patterns while maximizing the victim model's loss function. Our technique is based on directly modifying the pixel values in the patch which gives higher flexibility and larger space to incorporate multiple transformations compared to the GAN-based techniques. Furthermore, most clothing-based physical attacks assume static objects and ignore the possible transformations caused by non-rigid deformation due to changes in a person's pose. To address this limitation, we incorporate a ``Creases Transformation'' (CT) block, i.e., a preprocessing block following an Expectation Over Transformation (EOT) block used to generate a large variation of transformed patches incorporated in the training process to increase its robustness to different possible real-world distortions (e.g., creases in the clothing, rotation, re-scaling, random noise, brightness and contrast variations, etc.). We demonstrate that the presence of different real-world variations in clothing and object poses (i.e., above-mentioned distortions) lead to a drop in the performance of state-of-the-art attacks. For instance, these techniques can merely achieve 20\% in the physical world and 30.8\% in the digital world while our attack provides superior success rate of up to 65\% and 84.56\%, respectively when attacking the YOLOv3tiny detector deployed in smart cameras at the edge.
翻译:本文提出了一种无需使用生成对抗网络(GAN)即可生成自然形态对抗补丁的新方法。所提方法能够生成一种动态对抗补丁(DAP),该补丁在保持自然外观的同时,在真实场景中兼具高攻击效率与鲁棒性。为此,我们通过引入新的目标函数重新定义了优化问题,其中使用相似性度量构建相似性损失。该损失引导补丁在最大化受害者模型损失函数的同时遵循预定义模式。本方法直接修改补丁中的像素值,相比基于GAN的技术,具有更高的灵活性和更大的变换集成空间。此外,多数基于衣物的物理攻击假设目标为静态物体,忽略了人体姿势变化导致的非刚性形变可能引发的变换。为弥补这一局限,我们引入“褶皱变换(CT)”模块,即一个位于期望变换(EOT)模块之后的预处理模块,用于生成大量变换后的补丁并将其融入训练过程,以增强对不同真实世界失真(如衣物褶皱、旋转、缩放、随机噪声、亮度与对比度变化等)的鲁棒性。实验表明,衣物与目标姿势的不同真实世界变化(即上述失真)会导致现有先进攻击的性能显著下降。例如,这些技术在物理世界中仅能实现20%的成功率,数字世界中为30.8%,而我们的攻击在针对部署于边缘智能摄像头中的YOLOv3tiny检测器时,分别获得了高达65%和84.56%的卓越成功率。