REST: Receding Horizon Explorative Steiner Tree for Zero-Shot Object-Goal Navigation

Zero-shot object-goal navigation (ZSON) requires navigating unknown environments to find a target object without task-specific training. Prior hierarchical training-free solutions invest in scene understanding (\textit{belief}) and high-level decision-making (\textit{policy}), yet overlook the design of \textit{option}, i.e., a subgoal candidate proposed from evolving belief and presented to policy for selection. In practice, options are reduced to isolated waypoints scored independently: single destinations hide the value gathered along the journey; an unstructured collection obscures the relationships among candidates. Our insight is that the option space should be a \textit{tree of paths}. Full paths expose en-route information gain that destination-only scoring systematically neglects; a tree of shared segments enables coarse-to-fine LLM reasoning that dismisses or pursues entire branches before examining individual leaves, compressing the combinatorial path space into an efficient hierarchy. We instantiate this insight in \textbf{REST} (Receding Horizon Explorative Steiner Tree), a training-free framework that (1) builds an explicit open-vocabulary 3D map from online RGB-D streams; (2) grows an agent-centric tree of safe and informative paths as the option space via sampling-based planning; and (3) textualizes each branch into a spatial narrative and selects the next-best path through chain-of-thought LLM reasoning. Across the Gibson, HM3D, and HSSD benchmarks, REST consistently ranks among the top methods in success rate while achieving the best or second-best path efficiency, demonstrating a favorable efficiency-success balance.

翻译：零样本目标导向导航（ZSON）要求在不经过任务特定训练的情况下，在未知环境中导航以找到目标物体。现有的分层无训练解决方案侧重于场景理解（信念）和高层决策（策略），却忽视了选项设计——即从动态信念中生成子目标候选，并提交给策略进行选择。实际应用中，选项被简化为独立评分的孤立航点：单一目的地隐藏了沿途获取的价值；非结构化集合则模糊了候选点间的关联。我们的洞见是：选项空间应为路径树。完整路径暴露了仅以目的地评分会系统性忽略的途中信息增益；由共享路径段构成的树结构支持从粗到细的LLM推理——在检查单个叶节点前即可拒绝或采纳整条分支，将组合爆炸的路径空间压缩为高效层次结构。我们将这一洞见实例化为REST（递推视界探索施泰纳树），一个无训练框架，其(1)从在线RGB-D流构建显式开放词汇3D地图；(2)通过基于采样的规划，以智能体为中心生成安全且信息丰富的路径树作为选项空间；(3)将每条分支转换为空间叙事文本，并通过链式思考LLM推理选择最优路径。在Gibson、HM3D和HSSD基准测试中，REST在成功率上始终位居顶尖方法之列，同时实现最佳或次优的路径效率，展现出优秀的高效性与成功率平衡。