LiDAR-based 3D object detectors have been largely utilized in various applications, including autonomous vehicles or mobile robots. However, LiDAR-based detectors often fail to adapt well to target domains with different sensor configurations (e.g., types of sensors, spatial resolution, or FOVs) and location shifts. Collecting and annotating datasets in a new setup is commonly required to reduce such gaps, but it is often expensive and time-consuming. Recent studies suggest that pre-trained backbones can be learned in a self-supervised manner with large-scale unlabeled LiDAR frames. However, despite their expressive representations, they remain challenging to generalize well without substantial amounts of data from the target domain. Thus, we propose a novel method, called Domain Adaptive Distill-Tuning (DADT), to adapt a pre-trained model with limited target data (approximately 100 LiDAR frames), retaining its representation power and preventing it from overfitting. Specifically, we use regularizers to align object-level and context-level representations between the pre-trained and finetuned models in a teacher-student architecture. Our experiments with driving benchmarks, i.e., Waymo Open dataset and KITTI, confirm that our method effectively finetunes a pre-trained model, achieving significant gains in accuracy.
翻译:基于激光雷达的三维目标检测器已广泛应用于自动驾驶车辆和移动机器人等多种场景。然而,此类检测器往往难以有效适应具有不同传感器配置(如传感器类型、空间分辨率或视场角)及地理位置迁移的目标领域。为缩小此类差距,通常需要在新配置下采集并标注数据集,但这一过程往往成本高昂且耗时。近期研究表明,预训练骨干网络可通过大规模无标注激光雷达帧以自监督方式学习。然而,尽管这些模型具有强大的表征能力,若缺乏足量目标领域数据,其泛化性能仍面临挑战。为此,我们提出一种名为领域自适应蒸馏调优(DADT)的新方法,该方法能够利用有限目标数据(约100帧激光雷达数据)对预训练模型进行适配,在保持其表征能力的同时防止过拟合。具体而言,我们在师生架构中使用正则化器,对预训练模型与微调模型之间的物体级和上下文级表征进行对齐。通过在Waymo Open数据集和KITTI等自动驾驶基准测试中的实验验证,本方法能有效微调预训练模型,显著提升检测精度。