3D scene graphs offer a more efficient representation of the environment by hierarchically organizing diverse semantic entities and the topological relationships among them. Fiducial markers, on the other hand, offer a valuable mechanism for encoding comprehensive information pertaining to environments and the objects within them. In the context of Visual SLAM (VSLAM), especially when the reconstructed maps are enriched with practical semantic information, these markers have the potential to enhance the map by augmenting valuable semantic information and fostering meaningful connections among the semantic objects. In this regard, this paper exploits the potential of fiducial markers to incorporate a VSLAM framework with hierarchical representations that generates optimizable multi-layered vision-based situational graphs. The framework comprises a conventional VSLAM system with low-level feature tracking and mapping capabilities bolstered by the incorporation of a fiducial marker map. The fiducial markers aid in identifying walls and doors in the environment, subsequently establishing meaningful associations with high-level entities, including corridors and rooms. Experimental results are conducted on a real-world dataset collected using various legged robots and benchmarked against a Light Detection And Ranging (LiDAR)-based framework (S-Graphs) as the ground truth. Consequently, our framework not only excels in crafting a richer, multi-layered hierarchical map of the environment but also shows enhancement in robot pose accuracy when contrasted with state-of-the-art methodologies.
翻译:三维场景图通过分层组织多样的语义实体及其拓扑关系,提供了更高效的環境表示方式。另一方面,基准标记为编码环境及其中物体的综合信息提供了有价值的机制。在视觉SLAM(VSLAM)背景下,特别是当重建地图富含实际语义信息时,这些标记有潜力通过增加有价值的语义信息并促进语义对象之间的有意义连接来增强地图。为此,本文探索了利用基准标记将VSLAM框架与分层表示相结合的可能性,以生成可优化的多层基于视觉的情境图。该框架包含一个传统的VSLAM系统,具备低级特征跟踪和建图能力,并通过引入基准标记图加以增强。基准标记有助于识别环境中的墙壁和门,进而与走廊、房间等高级实体建立有意义的关联。实验在利用多种足式机器人收集的真实世界数据集上进行,并以基于激光雷达(LiDAR)的框架(S-Graphs)作为基准进行对照。结果表明,与现有最优方法相比,我们的框架不仅在构建更丰富、多层的分层环境地图方面表现出色,还显著提升了机器人位姿的精度。