We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating comprehensive, accurate, and efficient semantic 3D maps suitable for autonomous agents in unstructured environments. The proposed approach is based on a Voxel-TSDF representation used in recent algorithms. It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions. Further improvements are achieved by graph optimization-based semantic labeling and instance refinement. The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics. We also highlight a downfall in the evaluation of recent studies: using the ground truth trajectory as input instead of a SLAM-estimated one substantially affects the accuracy, creating a large gap between the reported results and the actual performance on real-world data.
翻译:本文提出一种在线二维到三维语义实例映射算法,旨在为自主智能体在非结构化环境中生成全面、准确且高效的语义三维地图。该方法基于当前算法中常用的体素-截断符号距离函数(Voxel-TSDF)表示,引入在映射过程中集成语义预测置信度的新方式,从而生成语义与实例一致的三维区域。通过基于图优化的语义标注与实例细化进一步提升了性能。所提方法在公开大规模数据集上的精度优于现有技术水平,在多项常用指标上取得改进。同时,本文揭示了近期研究中的一个评估缺陷:使用真实轨迹而非基于同时定位与建图(SLAM)估计的轨迹作为输入会显著影响精度,导致报告结果与实际场景性能之间存在巨大差距。