Multi-sensor fusion (MSF) is widely adopted for perception in autonomous vehicles (AVs), particularly for the task of 3D object detection with camera and LiDAR sensors. The rationale behind fusion is to capitalize on the strengths of each modality while mitigating their limitations. The exceptional and leading performance of fusion models has been demonstrated by advanced deep neural network (DNN)-based fusion techniques. Fusion models are also perceived as more robust to attacks compared to single-modal ones due to the redundant information in multiple modalities. In this work, we challenge this perspective with single-modal attacks that targets the camera modality, which is considered less significant in fusion but more affordable for attackers. We argue that the weakest link of fusion models depends on their most vulnerable modality, and propose an attack framework that targets advanced camera-LiDAR fusion models with adversarial patches. Our approach employs a two-stage optimization-based strategy that first comprehensively assesses vulnerable image areas under adversarial attacks, and then applies customized attack strategies to different fusion models, generating deployable patches. Evaluations with five state-of-the-art camera-LiDAR fusion models on a real-world dataset show that our attacks successfully compromise all models. Our approach can either reduce the mean average precision (mAP) of detection performance from 0.824 to 0.353 or degrade the detection score of the target object from 0.727 to 0.151 on average, demonstrating the effectiveness and practicality of our proposed attack framework.
翻译:多传感器融合(MSF)被广泛应用于自动驾驶汽车(AV)的感知系统,特别是利用摄像头和激光雷达传感器进行三维目标检测的任务。融合的核心理念在于发挥每种模态的优势,同时弥补其局限性。基于先进深度神经网络(DNN)的融合技术展现了卓越且领先的性能。由于多模态存在冗余信息,融合模型被认为比单模态模型对攻击更具鲁棒性。在本研究中,我们通过针对摄像头模态的单模态攻击来挑战这一观点——摄像头模态在融合中常被视为次要因素,但对攻击者而言更具可行性。我们认为融合模型的最薄弱环节取决于其最易受攻击的模态,并提出了一种针对先进摄像头-激光雷达融合模型的攻击框架,采用对抗性补丁。我们的方法采用两阶段优化策略:首先全面评估对抗攻击下易受攻击的图像区域,然后针对不同融合模型应用定制化攻击策略,生成可部署的补丁。在真实世界数据集上对五种先进的摄像头-激光雷达融合模型进行评估的结果表明,我们的攻击成功突破了所有模型。该方法可将检测性能的平均精度(mAP)从0.824降至0.353,或使目标对象的检测得分平均从0.727降至0.151,充分验证了所提攻击框架的有效性和实用性。