Modern autonomous driving perception systems utilize complementary multi-modal sensors, such as LiDAR and cameras. Although sensor fusion architectures enhance performance in challenging environments, they still suffer significant performance drops under severe sensor failures, such as LiDAR beam reduction, LiDAR drop, limited field of view, camera drop, and occlusion. This limitation stems from inter-modality dependencies in current sensor fusion frameworks. In this study, we introduce an efficient and robust LiDAR-camera 3D object detector, referred to as MoME, which can achieve robust performance through a mixture of experts approach. Our MoME fully decouples modality dependencies using three parallel expert decoders, which use camera features, LiDAR features, or a combination of both to decode object queries, respectively. We propose Multi-Expert Decoding (MED) framework, where each query is decoded selectively using one of three expert decoders. MoME utilizes an Adaptive Query Router (AQR) to select the most appropriate expert decoder for each query based on the quality of camera and LiDAR features. This ensures that each query is processed by the best-suited expert, resulting in robust performance across diverse sensor failure scenarios. We evaluated the performance of MoME on the nuScenes-R benchmark. Our MoME achieved state-of-the-art performance in extreme weather and sensor failure conditions, significantly outperforming the existing models across various sensor failure scenarios.
翻译:现代自动驾驶感知系统利用互补的多模态传感器,如激光雷达与摄像头。尽管传感器融合架构在复杂环境中提升了性能,但在严重传感器故障(如激光雷达波束削减、激光雷达信号丢失、视场受限、摄像头信号丢失及遮挡)下仍会出现显著的性能下降。这一局限源于当前传感器融合框架中的跨模态依赖性。本研究提出一种高效鲁棒的激光雷达-摄像头三维目标检测器MoME,通过专家混合方法实现鲁棒性能。我们的MoME采用三个并行专家解码器完全解耦模态依赖性,这些解码器分别使用摄像头特征、激光雷达特征或两者组合来解码目标查询。我们提出多专家解码框架,其中每个查询可选择性地通过三个专家解码器之一进行解码。MoME利用自适应查询路由器,根据摄像头和激光雷达特征的质量为每个查询选择最合适的专家解码器。这确保每个查询由最匹配的专家处理,从而在各类传感器故障场景中实现鲁棒性能。我们在nuScenes-R基准测试中评估了MoME的性能。实验表明,MoME在极端天气与传感器故障条件下取得了最先进的性能,在各种传感器故障场景中显著超越现有模型。