Open-Vocabulary Object Navigation (OVON) requires an embodied agent to locate a language-specified target in unknown environments. Existing zero-shot methods often reason over dense frontier points under incomplete observations, causing unstable route selection, repeated revisits, and unnecessary action overhead. We present DRIVE-Nav, a structured framework that organizes exploration around persistent directions rather than raw frontiers. By inspecting encountered directions more completely and restricting subsequent decisions to still-relevant directions within a forward 240 degree view range, DRIVE-Nav reduces redundant revisits and improves path efficiency. The framework extracts and tracks directional candidates from weighted Fast Marching Method (FMM) paths, maintains representative views for semantic inspection, and combines vision-language-guided prompt enrichment with cross-frame verification to improve grounding reliability. Experiments on HM3D-OVON, HM3Dv2, and MP3D demonstrate strong overall performance and consistent efficiency gains. On HM3D-OVON, DRIVE-Nav achieves 50.2% SR and 32.6% SPL, improving the previous best method by 1.9% SR and 5.6% SPL. It also delivers the best SPL on HM3Dv2 and MP3D and transfers to a physical humanoid robot. Real-world deployment also demonstrates its effectiveness. Project page: https://coolmaoguo.github.io/drive-nav-page/
翻译:开放词汇目标导航(Open-Vocabulary Object Navigation, OVON)要求具身智能体在未知环境中定位语言指定的目标。现有零样本方法通常基于不完整观测对稠密前沿点进行推理,导致路径选择不稳定、重复访问及不必要的动作开销。本文提出DRIVE-Nav,一种结构化框架,通过围绕持久方向而非原始前沿点组织探索。通过更完整地检测已遇到的方向,并将后续决策限定在前向240度视野范围内的相关方向,DRIVE-Nav减少了冗余重访并提升了路径效率。该框架从加权快速行进法(Fast Marching Method, FMM)路径中提取并跟踪候选方向,维护代表性视图用于语义检测,并融合视觉语言引导的提示增强与跨帧验证以提高定位可靠性。在HM3D-OVON、HM3Dv2和MP3D上的实验表明,该方法具有强整体性能与持续效率提升。在HM3D-OVON上,DRIVE-Nav实现了50.2%的成功率(SR)和32.6%的路径效率(SPL),较此前最优方法分别提升1.9%的SR和5.6%的SPL。该方法同时在HM3Dv2和MP3D上取得最佳SPL,并可迁移至实体人形机器人。实际部署亦验证了其有效性。项目主页:https://coolmaoguo.github.io/drive-nav-page/