Unified Adversarial Patch for Visible-Infrared Cross-modal Attacks in the Physical World

Physical adversarial attacks have put a severe threat to DNN-based object detectors. To enhance security, a combination of visible and infrared sensors is deployed in various scenarios, which has proven effective in disabling existing single-modal physical attacks. To further demonstrate the potential risks in such cases, we design a unified adversarial patch that can perform cross-modal physical attacks, achieving evasion in both modalities simultaneously with a single patch. Given the different imaging mechanisms of visible and infrared sensors, our work manipulates patches' shape features, which can be captured in different modalities when they undergo changes. To deal with challenges, we propose a novel boundary-limited shape optimization approach that aims to achieve compact and smooth shapes for the adversarial patch, making it easy to implement in the physical world. And a score-aware iterative evaluation method is also introduced to balance the fooling degree between visible and infrared detectors during optimization, which guides the adversarial patch to iteratively reduce the predicted scores of the multi-modal sensors. Furthermore, we propose an Affine-Transformation-based enhancement strategy that makes the learnable shape robust to various angles, thus mitigating the issue of shape deformation caused by different shooting angles in the real world. Our method is evaluated against several state-of-the-art object detectors, achieving an Attack Success Rate (ASR) of over 80%. We also demonstrate the effectiveness of our approach in physical-world scenarios under various settings, including different angles, distances, postures, and scenes for both visible and infrared sensors.

翻译：物理对抗攻击对基于深度神经网络的目标检测器构成了严重威胁。为提升安全性，多种场景中部署了可见光与红外传感器的组合，该方法已被证明能有效抵御现有单模态物理攻击。为进一步揭示此类场景中的潜在风险，我们设计了一种统一对抗补丁，可执行跨模态物理攻击，通过单一补丁同时实现两种模态下的逃逸。鉴于可见光与红外传感器成像机制的不同，我们的研究操控补丁的形状特征——当补丁发生形变时，这些特征可在不同模态中被捕获。为应对挑战，我们提出了一种新颖的边界受限形状优化方法，旨在生成紧凑且平滑的对抗补丁形状，使其易于在物理世界中实现。同时引入了一种评分感知迭代评估方法，在优化过程中平衡可见光与红外探测器之间的欺骗程度，引导对抗补丁迭代降低多模态传感器的预测分数。此外，我们提出了一种基于仿射变换的增强策略，使可学习形状对多角度具有鲁棒性，从而缓解真实世界中不同拍摄角度导致的形状变形问题。我们的方法在多个先进目标检测器上进行了评估，攻击成功率超过80%。同时，在物理世界场景中针对可见光与红外传感器的不同角度、距离、姿态及环境设置，我们验证了该方法的有效性。