Fusion is Not Enough: Single-Modal Attacks to Compromise Fusion Models in Autonomous Driving

Multi-sensor fusion (MSF) is widely adopted for perception in autonomous vehicles (AVs), particularly for the task of 3D object detection with camera and LiDAR sensors. The rationale behind fusion is to capitalize on the strengths of each modality while mitigating their limitations. The exceptional and leading performance of fusion models has been demonstrated by advanced deep neural network (DNN)-based fusion techniques. Fusion models are also perceived as more robust to attacks compared to single-modal ones due to the redundant information in multiple modalities. In this work, we challenge this perspective with single-modal attacks that targets the camera modality, which is considered less significant in fusion but more affordable for attackers. We argue that the weakest link of fusion models depends on their most vulnerable modality, and propose an attack framework that targets advanced camera-LiDAR fusion models with adversarial patches. Our approach employs a two-stage optimization-based strategy that first comprehensively assesses vulnerable image areas under adversarial attacks, and then applies customized attack strategies to different fusion models, generating deployable patches. Evaluations with five state-of-the-art camera-LiDAR fusion models on a real-world dataset show that our attacks successfully compromise all models. Our approach can either reduce the mean average precision (mAP) of detection performance from 0.824 to 0.353 or degrade the detection score of the target object from 0.727 to 0.151 on average, demonstrating the effectiveness and practicality of our proposed attack framework.

翻译：多传感器融合（MSF）被广泛应用于自动驾驶汽车（AV）的感知系统，特别是利用摄像头和激光雷达传感器进行三维目标检测的任务。融合的核心理念在于发挥每种模态的优势，同时弥补其局限性。基于先进深度神经网络（DNN）的融合技术展现了卓越且领先的性能。由于多模态存在冗余信息，融合模型被认为比单模态模型对攻击更具鲁棒性。在本研究中，我们通过针对摄像头模态的单模态攻击来挑战这一观点——摄像头模态在融合中常被视为次要因素，但对攻击者而言更具可行性。我们认为融合模型的最薄弱环节取决于其最易受攻击的模态，并提出了一种针对先进摄像头-激光雷达融合模型的攻击框架，采用对抗性补丁。我们的方法采用两阶段优化策略：首先全面评估对抗攻击下易受攻击的图像区域，然后针对不同融合模型应用定制化攻击策略，生成可部署的补丁。在真实世界数据集上对五种先进的摄像头-激光雷达融合模型进行评估的结果表明，我们的攻击成功突破了所有模型。该方法可将检测性能的平均精度（mAP）从0.824降至0.353，或使目标对象的检测得分平均从0.727降至0.151，充分验证了所提攻击框架的有效性和实用性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

47+阅读 · 2020年10月31日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

60+阅读 · 2019年10月17日