The ability to deploy robots that can operate safely in diverse environments is crucial for developing embodied intelligent agents. As a community, we have made tremendous progress in within-domain LiDAR semantic segmentation. However, do these methods generalize across domains? To answer this question, we design the first experimental setup for studying domain generalization (DG) for LiDAR semantic segmentation (DG-LSS). Our results confirm a significant gap between methods, evaluated in a cross-domain setting: for example, a model trained on the source dataset (SemanticKITTI) obtains $26.53$ mIoU on the target data, compared to $48.49$ mIoU obtained by the model trained on the target domain (nuScenes). To tackle this gap, we propose the first method specifically designed for DG-LSS, which obtains $34.88$ mIoU on the target domain, outperforming all baselines. Our method augments a sparse-convolutional encoder-decoder 3D segmentation network with an additional, dense 2D convolutional decoder that learns to classify a birds-eye view of the point cloud. This simple auxiliary task encourages the 3D network to learn features that are robust to sensor placement shifts and resolution, and are transferable across domains. With this work, we aim to inspire the community to develop and evaluate future models in such cross-domain conditions.
翻译:在多样化环境中安全运行的机器人部署能力对于发展具身智能体至关重要。目前,社区在域内LiDAR语义分割方面取得了巨大进展。然而,这些方法能否跨域泛化?为回答这一问题,我们设计了首个针对LiDAR语义分割的域泛化(DG-LSS)实验框架。研究结果证实,在跨域场景下评估的方法之间存在显著差距:例如,在源数据集(SemanticKITTI)上训练的模型在目标数据上仅获得26.53 mIoU,而目标域(nuScenes)训练的模型则达到48.49 mIoU。为弥补这一差距,我们提出首个专为DG-LSS设计的方法,在目标域上获得34.88 mIoU,超越所有基线。该方法通过附加的稠密二维卷积解码器增强稀疏卷积编码-解码三维分割网络,该解码器学习对点云鸟瞰视图进行分类。这一简单辅助任务激励三维网络学习对传感器位置偏移和分辨率变化鲁棒的特征,且这些特征可跨域迁移。通过此项研究,我们旨在激发社区在未来跨域条件下开发并评估新型模型。