Open-world navigation requires robots to make decisions in complex everyday environments while adapting to flexible task requirements. Conventional navigation approaches often rely on dense 3D reconstruction and hand-crafted goal metrics, which limits their generalization across tasks and environments. Recent advances in vision--language navigation (VLN) and vision--language--action (VLA) models enable end-to-end policies conditioned on natural language, but typically require interactive training, large-scale data collection, or task-specific fine-tuning with a mobile agent. We formulate navigation as a sparse subgoal identification and reaching problem and observe that providing visual anchoring targets for high-level semantic priors enables highly efficient goal-conditioned navigation. Based on this insight, we select navigation frontiers as semantic anchors and propose OpenFrontier, a training-free navigation framework that seamlessly integrates diverse vision--language prior models. OpenFrontier enables efficient navigation with a lightweight system design, without dense 3D mapping, policy training, or model fine-tuning. We evaluate OpenFrontier across multiple navigation benchmarks and demonstrate strong zero-shot performance, as well as effective real-world deployment on a mobile robot.
翻译:开放世界导航要求机器人在复杂的日常环境中做出决策,同时适应灵活的任务需求。传统的导航方法通常依赖于稠密的三维重建和人工设计的目标度量,这限制了其在任务和环境间的泛化能力。视觉-语言导航(VLN)和视觉-语言-动作(VLA)模型的最新进展实现了基于自然语言描述的端到端策略,但通常需要交互式训练、大规模数据收集或使用移动智能体进行任务特定的微调。我们将导航形式化为稀疏子目标识别与抵达问题,并观察到:为高层语义先验提供视觉锚定目标能够实现高效的目标条件导航。基于这一洞见,我们选择导航边界作为语义锚点,并提出OpenFrontier——一个无需训练即可无缝集成多种视觉-语言先验模型的导航框架。OpenFrontier通过轻量级系统设计实现高效导航,无需稠密三维建图、策略训练或模型微调。我们在多个导航基准上评估OpenFrontier,展示了其强大的零样本性能,以及在移动机器人上的有效实际部署。