Semantic segmentation networks trained under full supervision for one type of lidar fail to generalize to unseen lidars without intervention. To reduce the performance gap under domain shifts, a recent trend is to leverage vision foundation models (VFMs) providing robust features across domains. In this work, we conduct an exhaustive study to identify recipes for exploiting VFMs in unsupervised domain adaptation for semantic segmentation of lidar point clouds. Building upon unsupervised image-to-lidar knowledge distillation, our study reveals that: (1) the architecture of the lidar backbone is key to maximize the generalization performance on a target domain; (2) it is possible to pretrain a single backbone once and for all, and use it to address many domain shifts; (3) best results are obtained by keeping the pretrained backbone frozen and training an MLP head for semantic segmentation. The resulting pipeline achieves state-of-the-art results in four widely-recognized and challenging settings. The code will be available at: https://github.com/valeoai/muddos.
翻译:在完全监督下针对特定类型激光雷达训练的语义分割网络,若未经干预则无法泛化至未见过的激光雷达。为减小领域偏移下的性能差距,近期趋势是利用视觉基础模型(VFMs)提供跨领域的鲁棒特征。本研究通过系统实验,探索了在激光雷达点云语义分割的无监督领域自适应中有效利用VFMs的方案。基于无监督图像到激光雷达的知识蒸馏,我们的研究发现:(1)激光雷达主干网络架构是最大化目标领域泛化性能的关键;(2)可一次性预训练单一主干网络,并将其用于应对多种领域偏移;(3)最佳结果通过保持预训练主干网络冻结并训练用于语义分割的MLP头部获得。所提出的流程在四个公认且具挑战性的场景中取得了最先进的性能。代码将在以下地址公开:https://github.com/valeoai/muddos。