Autonomous Vehicles (AVs) increasingly depend on Multi-Sensor Fusion (MSF) to combine complementary modalities such as cameras and LiDAR for robust perception. While this redundancy is intended to safeguard against single-sensor failures, the fusion process itself introduces a subtle and underexplored vulnerability. In this work, we investigate whether an attacker can bypass MSF's redundancy by fabricating cross-sensor consistency, making multiple sensors agree on the same false object. We design a coordinated, data-level (early-fusion) attack that emulates the outcome of two synchronized physical spoofing sources: an infrared (IR) projection that induces a false camera detection and a LiDAR signal injection that produces a matching 3D point cluster. Rather than implementing the physical attack hardware, we simulate its sensor-level outcomes by inserting perspective-aware image patches and synthetic LiDAR point clusters aligned in 3D space. This approach preserves the perceptual effects that real IR and IEMI-based spoofing would create at the sensor output. Using 400 KITTI scenes, our large-scale evaluation shows that the coordinated spoofing deceives a state-of-the-art perception model with an 85.5% successful attack rate. These findings provide the first quantitative evidence that malicious cross-modal consistency can compromise MSF-based perception, revealing a critical vulnerability in the core data-fusion logic of modern autonomous vehicle systems.
翻译:自动驾驶汽车日益依赖多传感器融合技术,通过整合摄像头与激光雷达等互补模态实现稳健感知。尽管这种冗余设计旨在防范单一传感器故障,但融合过程本身却引入了一种隐蔽且尚未充分探索的脆弱性。本研究探究攻击者能否通过制造跨传感器一致性——使多传感器对同一虚假目标达成共识——来绕过多传感器融合的冗余机制。我们设计了一种协同化数据级(早期融合)攻击,模拟两个同步物理欺骗源的效果:诱导虚假摄像头检测的红外投影,以及产生匹配三维点云的激光雷达信号注入。通过插入透视感知图像块与三维空间对齐的合成激光雷达点云,该方案在传感器输出层面完整复现了真实红外与IEMI欺骗技术产生的感知效应。基于400个KITTI场景的大规模评估表明,该协同欺骗使最先进感知模型的攻击成功率达到85.5%。这些发现首次以量化证据表明,恶意跨模态一致性可破坏基于多传感器融合的感知系统,揭示了现代自动驾驶汽车核心数据融合逻辑中的关键漏洞。