Closing the domain gap between training and deployment and incorporating multiple sensor modalities are two challenging yet critical topics for self-driving. Existing work only focuses on single one of the above topics, overlooking the simultaneous domain and modality shift which pervasively exists in real-world scenarios. A model trained with multi-sensor data collected in Europe may need to run in Asia with a subset of input sensors available. In this work, we propose DualCross, a cross-modality cross-domain adaptation framework to facilitate the learning of a more robust monocular bird's-eye-view (BEV) perception model, which transfers the point cloud knowledge from a LiDAR sensor in one domain during the training phase to the camera-only testing scenario in a different domain. This work results in the first open analysis of cross-domain cross-sensor perception and adaptation for monocular 3D tasks in the wild. We benchmark our approach on large-scale datasets under a wide range of domain shifts and show state-of-the-art results against various baselines.
翻译:摘要:缩小训练与部署之间的领域差距以及融合多种传感器模态,是自动驾驶领域中两个具有挑战性且至关重要的课题。现有工作仅关注上述课题中的单一方向,忽视了实际场景中普遍存在的领域与模态同步转移问题。例如,基于欧洲多传感器数据集训练的模型,可能需要仅依靠部分可用输入传感器在亚洲场景中运行。为此,我们提出DualCross——一种跨模态跨域自适应框架,旨在促进更鲁棒的单目鸟瞰图(BEV)感知模型的学习。该框架将训练阶段某一领域中激光雷达传感器的点云知识,迁移至不同领域的纯摄像头测试场景中。本研究首次在开放环境下对跨域跨传感器感知及单目3D任务自适应进行了系统分析。我们在多种领域迁移场景下基于大规模数据集对所提方法进行基准测试,结果表明其相较各类基线方法取得了最优性能。