The ability to deploy robots that can operate safely in diverse environments is crucial for developing embodied intelligent agents. As a community, we have made tremendous progress in within-domain LiDAR semantic segmentation. However, do these methods generalize across domains? To answer this question, we design the first experimental setup for studying domain generalization (DG) for LiDAR semantic segmentation (DG-LSS). Our results confirm a significant gap between methods, evaluated in a cross-domain setting: for example, a model trained on the source dataset (SemanticKITTI) obtains $26.53$ mIoU on the target data, compared to $48.49$ mIoU obtained by the model trained on the target domain (nuScenes). To tackle this gap, we propose the first method specifically designed for DG-LSS, which obtains $34.88$ mIoU on the target domain, outperforming all baselines. Our method augments a sparse-convolutional encoder-decoder 3D segmentation network with an additional, dense 2D convolutional decoder that learns to classify a birds-eye view of the point cloud. This simple auxiliary task encourages the 3D network to learn features that are robust to sensor placement shifts and resolution, and are transferable across domains. With this work, we aim to inspire the community to develop and evaluate future models in such cross-domain conditions.
翻译:能够在多样环境中安全运行的机器人部署能力对于发展具身智能体至关重要。该领域在域内激光雷达语义分割方面取得了巨大进展,但这些方法能否跨域泛化?为解答此问题,我们设计了首个针对激光雷达语义分割域泛化(DG-LSS)研究的实验框架。实验结果证实,跨域评估时各方法间存在显著性能差距:例如,在源数据集(SemanticKITTI)上训练的模型在目标数据上仅获得26.53 mIoU,而在目标域(nuScenes)上训练的模型则达到48.49 mIoU。为弥补这一差距,我们提出了首个专门针对DG-LSS的方法,该方法在目标域上获得34.88 mIoU,优于所有基线模型。本方法在稀疏卷积编码器-解码器3D分割网络基础上,增加密集2D卷积解码器以学习点云鸟瞰图的分类。这一简单辅助任务促使3D网络学习对传感器位置偏移和分辨率变化鲁棒的特征,并实现跨域迁移。我们希望借此工作激发学界在未来研究中开发和评估此类跨域条件下的模型。