While Open Set Semantic Mapping and 3D Semantic Scene Graphs (3DSSGs) are established paradigms in robotic perception, deploying them effectively to support high-level reasoning in large-scale, real-world environments remains a significant challenge. Most existing approaches decouple perception from representation, treating the scene graph as a derivative layer generated post hoc. This limits both consistency and scalability. In contrast, we propose a mapping architecture where the 3DSSG serves as the foundational backend, acting as the primary knowledge representation for the entire mapping process. Our approach leverages prior work on incremental scene graph prediction to infer and update the graph structure in real-time as the environment is explored. This ensures that the map remains topologically consistent and computationally efficient, even during extended operations in large-scale settings. By maintaining an explicit, spatially grounded representation that supports both flat and hierarchical topologies, we bridge the gap between sub-symbolic raw sensor data and high-level symbolic reasoning. Consequently, this provides a stable, verifiable structure that knowledge-driven frameworks, ranging from knowledge graphs and ontologies to Large Language Models (LLMs), can directly exploit, enabling agents to operate with enhanced interpretability, trustworthiness, and alignment to human concepts.
翻译:尽管开放集语义建图与三维语义场景图(3DSSGs)已成为机器人感知领域的成熟范式,但将其有效部署于大规模真实环境以支持高层推理仍面临重大挑战。现有方法大多将感知与表征解耦,将场景图视为事后生成的衍生层,这限制了系统的一致性与可扩展性。与此不同,我们提出一种以3DSSG作为基础后端的建图架构,使其成为整个建图过程的核心知识表征。该方法借鉴增量式场景图预测的前期成果,在探索环境时实时推断并更新图结构,从而确保地图即使在长期大规模运行中仍能保持拓扑一致性及计算效率。通过维护显式且空间锚定的表征(同时支持扁平与层次化拓扑),我们弥合了亚符号原始传感器数据与高层符号推理之间的鸿沟。由此产生的稳定、可验证结构可直接被知识驱动框架(从知识图谱、本体论到大型语言模型(LLMs))所利用,使智能体能够以更强的可解释性、可信度及与人类概念的契合度进行运作。