Accurate localization is a fundamental requirement for autonomous robots operating in indoor environments. Scene graphs encode the spatial structure of an environment as a hierarchy of semantic entities and their relationships, and can be constructed both online from robot sensor data and offline from architectural priors such as Building Information Models (BIM). Matching these two complementary representations enables drift correction in SLAM by grounding robot observations against a known structural prior. However, establishing reliable node-to-node correspondences between them remains an open challenge: existing combinatorial methods are prohibitively expensive at scale, and prior learned approaches address only flat graph matching, ignoring the multi-level semantic structure present in both representations. Here we present a learned, end-to-end differentiable pipeline that augments both graphs with semantically motivated edge types encoding intra- and inter- level relationships, explicitly exploiting this hierarchy to enable simultaneous matching from high-level room concepts down to low-level wall surfaces. Trained exclusively on floor plans, the proposed method outperforms the combinatorial baseline in F1 on real LiDAR environments while running an order of magnitude faster, demonstrating viable zero-shot generalization for BIM-assisted robot localization.
翻译:精准定位是自主机器人在室内环境中运行的基本要求。场景图将环境的空间结构编码为语义实体及其关系的层级体系,既可通过机器人传感器数据在线构建,也可从建筑信息模型(BIM)等建筑先验信息离线生成。匹配这两种互补表示形式,能通过将机器人观测结果锚定到已知结构先验上,实现SLAM中的漂移修正。然而,在两者之间建立可靠的节点间对应关系仍是一项开放性挑战:现有组合方法在大规模场景下计算成本过高,而先前的学习方法仅针对平面图匹配,忽略了两种表示中固有的多层语义结构。本文提出一种可学习的端到端可微分流水线,通过为两类图赋予语义驱动的边类型(编码层内与层间关系),显式利用分层结构实现从高层房间概念到低层墙面表面的同步匹配。该方法仅在平面图上训练,在真实激光雷达环境中的F1指标上超越组合基线方法,且运行速度快一个数量级,展示了在BIM辅助机器人定位中具备可行的零样本泛化能力。