Fiducial markers can encode rich information about the environment and can aid Visual SLAM (VSLAM) approaches in reconstructing maps with practical semantic information. Current marker-based VSLAM approaches mainly utilize markers for improving feature detections in low-feature environments and/or for incorporating loop closure constraints, generating only low-level geometric maps of the environment prone to inaccuracies in complex environments. To bridge this gap, this paper presents a VSLAM approach utilizing a monocular camera along with fiducial markers to generate hierarchical representations of the environment while improving the camera pose estimate. The proposed approach detects semantic entities from the surroundings, including walls, corridors, and rooms encoded within markers, and appropriately adds topological constraints among them. Experimental results on a real-world dataset collected with a robot demonstrate that the proposed approach outperforms a traditional marker-based VSLAM baseline in terms of accuracy, given the addition of new constraints while creating enhanced map representations. Furthermore, it shows satisfactory results when comparing the reconstructed map quality to the one reconstructed using a LiDAR SLAM approach.
翻译:基准标记可以编码环境的丰富信息,并有助于视觉SLAM(VSLAM)方法重建具有实用语义信息的地图。现有的基于标记的VSLAM方法主要利用标记来改善低特征环境中的特征检测和/或引入闭环约束,仅生成环境的低级几何地图,在复杂环境中易出现不准确。为弥补这一不足,本文提出一种利用单目相机与基准标记生成环境层级表征并提升相机位姿估计的VSLAM方法。该方法从周围环境中检测语义实体(包括编码在标记内的墙壁、走廊和房间),并适当添加它们之间的拓扑约束。在机器人采集的真实世界数据集上的实验结果表明,由于添加了新约束并创建了增强的地图表征,所提方法在精度上优于传统的基于标记的VSLAM基线方法。此外,与使用LiDAR SLAM方法重建的地图质量相比,该方法也展现出令人满意的结果。