We introduce Multi-Source 3D (MS3D), a new self-training pipeline for unsupervised domain adaptation in 3D object detection. Despite the remarkable accuracy of 3D detectors, they often overfit to specific domain biases, leading to suboptimal performance in various sensor setups and environments. Existing methods typically focus on adapting a single detector to the target domain, overlooking the fact that different detectors possess distinct expertise on different unseen domains. MS3D leverages this by combining different pre-trained detectors from multiple source domains and incorporating temporal information to produce high-quality pseudo-labels for fine-tuning. Our proposed Kernel-Density Estimation (KDE) Box Fusion method fuses box proposals from multiple domains to obtain pseudo-labels that surpass the performance of the best source domain detectors. MS3D exhibits greater robustness to domain shift and produces accurate pseudo-labels over greater distances, making it well-suited for high-to-low beam domain adaptation and vice versa. Our method achieved state-of-the-art performance on all evaluated datasets, and we demonstrate that the pre-trained detector's source dataset has minimal impact on the fine-tuned result, making MS3D suitable for real-world applications.
翻译:我们提出了多源三维(MS3D)方法,这是一种用于三维目标检测中无监督域适应的新型自训练流程。尽管三维检测器具有显著的准确率,但它们常常过度拟合特定的域偏差,导致在不同传感器配置和环境中的性能欠佳。现有方法通常侧重于将单个检测器适应到目标域,而忽视了不同检测器在不同未知域上具有独特的专业知识。MS3D通过结合来自多个源域的不同预训练检测器,并融入时序信息来生成用于微调的高质量伪标签,从而利用了这一点。我们提出的核密度估计(KDE)框融合方法融合了来自多个域的框提议,以获得超越最佳源域检测器性能的伪标签。MS3D对域偏移表现出更强的鲁棒性,并能在大范围内生成准确的伪标签,使其特别适用于高-低波束域适应及其反向适应。我们的方法在所有评估数据集上均达到了最先进的性能,并且我们证明了预训练检测器的源数据集对微调结果的影响极小,这使得MS3D适用于实际应用场景。