Autonomous language-guided navigation in large-scale outdoor environments remains a key challenge in mobile robotics, due to difficulties in semantic reasoning, dynamic conditions, and long-term stability. We propose CausalNav, the first scene graph-based semantic navigation framework tailored for dynamic outdoor environments. We construct a multi-level semantic scene graph using LLMs, referred to as the Embodied Graph, that hierarchically integrates coarse-grained map data with fine-grained object entities. The constructed graph serves as a retrievable knowledge base for Retrieval-Augmented Generation (RAG), enabling semantic navigation and long-range planning under open-vocabulary queries. By fusing real-time perception with offline map data, the Embodied Graph supports robust navigation across varying spatial granularities in dynamic outdoor environments. Dynamic objects are explicitly handled in both the scene graph construction and hierarchical planning modules. The Embodied Graph is continuously updated within a temporal window to reflect environmental changes and support real-time semantic navigation. Extensive experiments in both simulation and real-world settings demonstrate superior robustness and efficiency.
翻译:大规模室外环境中的自主语言引导导航,由于语义推理、动态条件及长期稳定性等方面的困难,一直是移动机器人领域的关键挑战。我们提出了CausalNav,首个专为动态室外环境设计的基于场景图的语义导航框架。我们利用大语言模型构建了一个多层级语义场景图,称为具身图,该图将粗粒度的地图数据与细粒度的物体实体进行层次化整合。构建的图作为可检索的知识库用于检索增强生成,支持开放词汇查询下的语义导航与长程规划。通过融合实时感知与离线地图数据,具身图能够在动态室外环境中实现跨不同空间粒度的鲁棒导航。动态物体在场景图构建和分层规划模块中均得到显式处理。具身图在时间窗口内持续更新,以反映环境变化并支持实时语义导航。在仿真和真实场景中的大量实验验证了其卓越的鲁棒性和效率。