We introduce an online 2D-to-3D semantic instance mapping algorithm aimed at generating comprehensive, accurate, and efficient semantic 3D maps suitable for autonomous agents in unstructured environments. The proposed approach is based on a Voxel-TSDF representation used in recent algorithms. It introduces novel ways of integrating semantic prediction confidence during mapping, producing semantic and instance-consistent 3D regions. Further improvements are achieved by graph optimization-based semantic labeling and instance refinement. The proposed method achieves accuracy superior to the state of the art on public large-scale datasets, improving on a number of widely used metrics. We also highlight a downfall in the evaluation of recent studies: using the ground truth trajectory as input instead of a SLAM-estimated one substantially affects the accuracy, creating a large gap between the reported results and the actual performance on real-world data.
翻译:我们提出了一种在线二维到三维语义实例映射算法,旨在为自主代理在非结构化环境中生成全面、准确且高效的三维语义地图。该方法基于近期算法中常用的体素截断符号距离函数表示,创新性地引入了语义预测置信度在映射过程中的集成方式,从而生成语义一致且实例一致的三维区域。通过基于图优化的语义标注与实例细化,进一步提升了算法性能。在公开大规模数据集上,该方法在多项目前广泛使用的评价指标上均超越了现有最优技术的精度。此外,我们揭示了近期研究中的评估缺陷:使用真实轨迹而非同步定位与建图估计轨迹作为输入会显著影响精度,导致报告结果与实际在真实数据上的性能之间存在显著差距。