Fiducial markers can encode rich information about the environment and can aid Visual SLAM (VSLAM) approaches in reconstructing maps with practical semantic information. Current marker-based VSLAM approaches mainly utilize markers for improving feature detections in low-feature environments and/or for incorporating loop closure constraints, generating only low-level geometric maps of the environment prone to inaccuracies in complex environments. To bridge this gap, this paper presents a VSLAM approach utilizing a monocular camera along with fiducial markers to generate hierarchical representations of the environment while improving the camera pose estimate. The proposed approach detects semantic entities from the surroundings, including walls, corridors, and rooms encoded within markers, and appropriately adds topological constraints among them. Experimental results on a real-world dataset collected with a robot demonstrate that the proposed approach outperforms a traditional marker-based VSLAM baseline in terms of accuracy, given the addition of new constraints while creating enhanced map representations. Furthermore, it shows satisfactory results when comparing the reconstructed map quality to the one reconstructed using a LiDAR SLAM approach.
翻译:基准标记可编码丰富的环境信息,能够辅助视觉SLAM(VSLAM)方法重建具有实用语义信息的地图。当前基于标记的VSLAM方法主要利用标记来改进低特征环境中的特征检测和/或引入闭环约束,但仅生成环境低层次几何地图,在复杂环境中易产生误差。为弥补这一不足,本文提出一种利用单目相机与基准标记的VSLAM方法,在提升相机位姿估计精度的同时生成环境的层次化表征。该方法从环境中检测语义实体,包括编码在标记中的墙壁、走廊和房间,并合理添加它们之间的拓扑约束。在机器人采集的真实世界数据集上的实验结果表明,通过增加新约束并创建增强型地图表征,所提方法在精度上优于传统基于标记的VSLAM基线方法。此外,与使用激光雷达SLAM方法重建的地图质量相比,该方法也展现出令人满意的结果。