Recent LiDAR-based 3D Object Detection (3DOD) methods show promising results, but they often do not generalize well to target domains outside the source (or training) data distribution. To reduce such domain gaps and thus to make 3DOD models more generalizable, we introduce a novel unsupervised domain adaptation (UDA) method, called CMDA, which (i) leverages visual semantic cues from an image modality (i.e., camera images) as an effective semantic bridge to close the domain gap in the cross-modal Bird's Eye View (BEV) representations. Further, (ii) we also introduce a self-training-based learning strategy, wherein a model is adversarially trained to generate domain-invariant features, which disrupt the discrimination of whether a feature instance comes from a source or an unseen target domain. Overall, our CMDA framework guides the 3DOD model to generate highly informative and domain-adaptive features for novel data distributions. In our extensive experiments with large-scale benchmarks, such as nuScenes, Waymo, and KITTI, those mentioned above provide significant performance gains for UDA tasks, achieving state-of-the-art performance.
翻译:摘要:近期基于LiDAR的三维目标检测方法虽展现出优异性能,但往往难以泛化至源域(或训练数据)分布之外的目标域。为缩小此类域差异,增强三维目标检测模型的泛化能力,本文提出了一种名为CMDA的新型无监督域适应方法,该方法(i)利用图像模态(即相机图像)的视觉语义线索作为有效的语义桥梁,以缩小跨模态鸟瞰图表示中的域差异。此外,(ii)我们还引入了一种基于自训练的学习策略,通过对抗训练使模型生成域不变特征,从而破坏对特征实例来自源域还是未见目标域的判别。总体而言,我们的CMDA框架引导三维目标检测模型为新颖数据分布生成高度信息性且具有域适应能力的特征。在nuScenes、Waymo和KITTI等大规模基准上的广泛实验中,上述机制为无监督域适应任务带来了显著的性能提升,达到了当前最优水平。