NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models

from arxiv, Accepted to IEEE Transactions on Image Processing (TIP). The source code will be made publicly available at https://github.com/lynn-yu/NRSeg

Birds' Eye View (BEV) semantic segmentation is an indispensable perception task in end-to-end autonomous driving systems. Unsupervised and semi-supervised learning for BEV tasks, as pivotal for real-world applications, underperform due to the homogeneous distribution of the labeled data. In this work, we explore the potential of synthetic data from driving world models to enhance the diversity of labeled data for robustifying BEV segmentation. Yet, our preliminary findings reveal that generation noise in synthetic data compromises efficient BEV model learning. To fully harness the potential of synthetic data from world models, this paper proposes NRSeg, a noise-resilient learning framework for BEV semantic segmentation. Specifically, a Perspective-Geometry Consistency Metric (PGCM) is proposed to quantitatively evaluate the guidance capability of generated data for model learning. This metric originates from the alignment measure between the perspective road mask of generated data and the mask projected from the BEV labels. Moreover, a Bi-Distribution Parallel Prediction (BiDPP) is designed to enhance the inherent robustness of the model, where the learning process is constrained through parallel prediction of multinomial and Dirichlet distributions. The former efficiently predicts semantic probabilities, whereas the latter adopts evidential deep learning to realize uncertainty quantification. Furthermore, a Hierarchical Local Semantic Exclusion (HLSE) module is designed to address the non-mutual exclusivity inherent in BEV semantic segmentation tasks. Experimental results demonstrate that NRSeg achieves state-of-the-art performance, yielding the highest improvements in mIoU of 13.8% and 11.4% in unsupervised and semi-supervised BEV segmentation tasks, respectively. The source code will be made publicly available at https://github.com/lynn-yu/NRSeg.

翻译：鸟瞰图（BEV）语义分割是端到端自动驾驶系统中不可或缺的感知任务。作为实际应用的关键，针对BEV任务的无监督和半监督学习由于标注数据的分布同质化而表现不佳。在本工作中，我们探索了利用来自驾驶世界模型的合成数据来增强标注数据多样性，从而提升BEV分割鲁棒性的潜力。然而，我们的初步研究发现，合成数据中的生成噪声会损害BEV模型的学习效率。为了充分利用世界模型合成数据的潜力，本文提出了NRSeg，一种用于BEV语义分割的噪声鲁棒学习框架。具体而言，我们提出了一种透视-几何一致性度量（PGCM），用于定量评估生成数据对模型学习的引导能力。该度量源于生成数据的透视道路掩码与从BEV标签投影得到的掩码之间的对齐程度测量。此外，我们设计了一种双分布并行预测（BiDPP）方法来增强模型的内在鲁棒性，该方法通过并行预测多项分布和狄利克雷分布来约束学习过程。前者高效地预测语义概率，而后者则采用证据深度学习来实现不确定性量化。进一步地，我们设计了一个分层局部语义排除（HLSE）模块，以解决BEV语义分割任务中固有的非互斥性问题。实验结果表明，NRSeg取得了最先进的性能，在无监督和半监督BEV分割任务中分别实现了13.8%和11.4%的mIoU最高提升。源代码将在 https://github.com/lynn-yu/NRSeg 公开。