SFCo-Nav: Efficient Zero-Shot Visual Language Navigation via Collaboration of Slow LLM and Fast Attributed Graph Alignment

Recent advances in large vision-language models (VLMs) and large language models (LLMs) have enabled zero-shot approaches to visual language navigation (VLN), where an agent follows natural language instructions using only ego perception and reasoning. However, existing zero-shot methods typically construct a naive observation graph and perform per-step VLM-LLM inference on it, resulting in high latency and computation costs that limit real-time deployment. To address this, we present SFCo-Nav, an efficient zero-shot VLN framework inspired by the principle of slow-fast cognitive collaboration. SFCo-Nav integrates three key modules: 1) a slow LLM-based planner that produces a strategic chain of subgoals, each linked to an imagined object graph; 2) a fast reactive navigator for real-time object graph construction and subgoal execution; and 3) a lightweight asynchronous slow-fast bridge aligns advanced structured, attributed imagined and perceived graphs to estimate navigation confidence, triggering the slow LLM planner only when necessary. To the best of our knowledge, SFCo-Nav is the first slow-fast collaboration zero-shot VLN system supporting asynchronous LLM triggering according to the internal confidence. Evaluated on the public R2R and REVERIE benchmarks, SFCo-Nav matches or exceeds prior state-of-the-art zero-shot VLN success rates while cutting total token consumption per trajectory by over 50% and running more than 3.5 times faster. Finally, we demonstrate SFCo-Nav on a legged robot in a hotel suite, showcasing its efficiency and practicality in indoor environments.

翻译：近期，大型视觉语言模型（VLMs）和大型语言模型（LLMs）的进展使得零样本视觉语言导航（VLN）成为可能，其中智能体仅利用自身感知和推理来遵循自然语言指令。然而，现有的零样本方法通常构建一个简单的观测图，并在其上执行每步的VLM-LLM推理，导致高延迟和高计算成本，限制了实时部署。为解决此问题，我们提出了SFCo-Nav，一个受慢-快认知协作原理启发的高效零样本VLN框架。SFCo-Nav集成了三个关键模块：1）一个基于慢速LLM的规划器，生成一系列与想象对象图关联的策略性子目标链；2）一个用于实时对象图构建和子目标执行的快速反应式导航器；以及3）一个轻量级的异步慢-快桥接器，该桥接器对齐高级结构化、带属性的想象图与感知图以估计导航置信度，仅在必要时触发慢速LLM规划器。据我们所知，SFCo-Nav是首个支持根据内部置信度异步触发LLM的慢-快协作零样本VLN系统。在公开的R2R和REVERIE基准测试中，SFCo-Nav达到或超越了先前最先进的零样本VLN成功率，同时将每条轨迹的总令牌消耗削减超过50%，运行速度提升超过3.5倍。最后，我们在酒店套房中的腿式机器人上演示了SFCo-Nav，展示了其在室内环境中的高效性和实用性。