IROS：一种基于VLM实时室内导航的双过程架构 (IROS: A Dual-Process Architecture for Real-Time VLM-Based Indoor Navigation)

Indoor mobile robot navigation requires fast responsiveness and robust semantic understanding, yet existing methods struggle to provide both. Classical geometric approaches such as SLAM offer reliable localization but depend on detailed maps and cannot interpret human-targeted cues (e.g., signs, room numbers) essential for indoor reasoning. Vision-Language-Action (VLA) models introduce semantic grounding but remain strictly reactive, basing decisions only on visible frames and failing to anticipate unseen intersections or reason about distant textual cues. Vision-Language Models (VLMs) provide richer contextual inference but suffer from high computational latency, making them unsuitable for real-time operation on embedded platforms. In this work, we present IROS, a real-time navigation framework that combines VLM-level contextual reasoning with the efficiency of lightweight perceptual modules on low-cost, on-device hardware. Inspired by Dual Process Theory, IROS separates fast reflexive decisions (System One) from slow deliberative reasoning (System Two), invoking the VLM only when necessary. Furthermore, by augmenting compact VLMs with spatial and textual cues, IROS delivers robust, human-like navigation with minimal latency. Across five real-world buildings, IROS improves decision accuracy and reduces latency by 66% compared to continuous VLM-based navigation.

翻译：室内移动机器人导航需要快速响应能力和鲁棒的语义理解能力，然而现有方法难以同时满足这两点。经典的几何方法（如SLAM）能提供可靠的定位，但依赖于详细的地图，且无法解析室内推理所需的人为指向性线索（例如标识、房间号）。视觉-语言-动作（VLA）模型引入了语义基础，但仍严格局限于反应式决策，仅依据可见帧做出判断，无法预判未见的交叉路口或推理远处的文本线索。视觉-语言模型（VLM）能提供更丰富的上下文推理，但存在计算延迟高的问题，使其不适用于嵌入式平台上的实时操作。本文提出IROS，一种实时导航框架，它结合了VLM级的上下文推理能力与轻量级感知模块在低成本设备硬件上的运行效率。受双过程理论启发，IROS将快速反射性决策（系统一）与缓慢的审慎推理（系统二）分离，仅在必要时调用VLM。此外，通过为紧凑型VLM增强空间与文本线索，IROS能以最小延迟实现鲁棒的、类人化的导航。在五个真实建筑场景中的实验表明，与基于VLM的连续导航相比，IROS将决策准确率提升了66%，并显著降低了延迟。

相关内容

IROS

关注 169

IEEE\RSJ International Conference on Intelligent Robots and Systems（IROS 2019）包括全体会议和主题演讲、技术会议、研讨会和教程、论坛、先驱演讲和展览，以丰富与会人员之间富有成果的讨论。它是机器人与智能系统领域的旗舰国际会议，由IEEE、IEEE机器人与自动化协会（RAS）、IEEE工业电子协会（IES）、日本机器人协会（RSJ）、仪器与控制工程师协会（SICE）和新技术基金会（NTF）共同赞助。IEEE是一个非盈利的技术专业协会，在160个国家拥有40多万会员。它是从计算机工程、生物医学技术和电信到电力、航空航天和消费电子等技术领域的领先权威。官网链接：https://www.iros2019.org/

OpenEarthAgent：一种面向工具增强型地理空间智能体的统一框架

专知会员服务

11+阅读 · 2月20日

面向机器人操作的基于大型视觉‑语言模型（VLM）的视觉‑语言‑动作（VLA）模型综述

专知会员服务

34+阅读 · 2025年8月19日

《战场GPS拒止环境下基于地标定位的安全路径导航》

专知会员服务

18+阅读 · 2025年5月22日

《面向无人机实时认知任务解决的视觉-语言-动作（VLA）模型与评估基准》

专知会员服务

41+阅读 · 2025年3月9日