For scene understanding in unstructured environments, an accurate and uncertainty-aware metric-semantic mapping is required to enable informed action selection by autonomous systems. Existing mapping methods often suffer from overconfident semantic predictions, and sparse and noisy depth sensing, leading to inconsistent map representations. In this paper, we therefore introduce EvidMTL, a multi-task learning framework that uses evidential heads for depth estimation and semantic segmentation, enabling uncertainty-aware inference from monocular RGB images. To enable uncertainty-calibrated evidential multi-task learning, we propose a novel evidential depth loss function that jointly optimizes the belief strength of the depth prediction in conjunction with evidential segmentation loss. Building on this, we present EvidKimera, an uncertainty-aware semantic surface mapping framework, which uses evidential depth and semantics prediction for improved 3D metric-semantic consistency. We train and evaluate EvidMTL on the NYUDepthV2 and assess its zero-shot performance on ScanNetV2, demonstrating superior uncertainty estimation compared to conventional approaches while maintaining comparable depth estimation and semantic segmentation. In zero-shot mapping tests on ScanNetV2, EvidKimera outperforms Kimera in semantic surface mapping accuracy and consistency, highlighting the benefits of uncertainty-aware mapping and underscoring its potential for real-world robotic applications.
翻译:在非结构化环境中进行场景理解时,需要精确且具有不确定性感知的度量语义建图,以使自主系统能够做出明智的行动选择。现有的建图方法通常存在语义预测过于自信、深度感知稀疏且有噪声的问题,导致建图表示不一致。为此,本文提出EvidMTL,一种多任务学习框架,该框架使用证据头进行深度估计和语义分割,从而能够从单目RGB图像进行不确定性感知推理。为了实现不确定性校准的证据多任务学习,我们提出了一种新颖的证据深度损失函数,该函数与证据分割损失联合优化深度预测的置信强度。在此基础上,我们提出了EvidKimera,一个不确定性感知的语义表面建图框架,它利用证据深度和语义预测来提升三维度量语义一致性。我们在NYUDepthV2数据集上训练和评估EvidMTL,并在ScanNetV2上评估其零样本性能,结果表明,在保持可比深度估计和语义分割性能的同时,其不确定性估计优于传统方法。在ScanNetV2上的零样本建图测试中,EvidKimera在语义表面建图的准确性和一致性方面优于Kimera,突显了不确定性感知建图的优势,并强调了其在现实世界机器人应用中的潜力。