Zero-shot Object Navigation (ZSON) has shown promise for open-vocabulary target search in unseen environments, yet most existing systems remain tied to planar representations and single-floor assumptions. These assumptions become inadequate in real buildings, where navigation involves floors, stairs, landings, and vertically overlapping spaces. This article presents TravExplorer, a cross-floor embodied exploration framework that couples zero-shot semantic guidance with traversability-aware 3-D planning. TravExplorer maintains a unified volumetric map that distinguishes occupied structures from robot-reachable support surfaces and extracts traversable frontiers from connected support surfaces, including floors, stairs, and landings. A FOV-aware active perception strategy further resolves incomplete observations during cross-floor traversal. To reduce semantic-reasoning latency, a lightweight guidance module aligns a probabilistic instance map from online open-vocabulary segmentation with a spatial value map from fast image-to-text matching. Based on these geometric and semantic memories, a hierarchical planner performs target-aware frontier touring over object hypotheses, traversable frontiers, and stair landmarks, and generates executable cross-floor motions through foothold-guided 3-D search and vertically constrained local trajectory optimization. Experiments over 4,195 simulated episodes on HM3D and MP3D demonstrate consistent advantages over representative ObjectNav baselines. Fifty real-world trials on a Unitree Go2 further validate open-vocabulary target search across single-floor and cross-floor indoor environments without prior maps or human intervention. The code will be released at https://github.com/wuyi2121/TravExplorer.
翻译:零样本目标导航(ZSON)已在未见环境中展现出进行开放词汇目标搜索的前景,然而现有大多数系统仍局限于平面表示和单楼层假设。这些假设在真实建筑中变得不切实际——真实导航涉及楼层、楼梯、平台以及垂直重叠空间。本文提出TravExplorer,一种跨楼层实体探索框架,将零样本语义引导与可通过性感知的三维规划相结合。TravExplorer维护统一的体素地图,区分占用结构与机器人可达支撑表面,并从相连的支撑表面(包括楼层、楼梯和平台)中提取可通过前沿边界。一种视场(FOV)感知主动感知策略进一步解决了跨楼层穿越过程中的不完整观测问题。为降低语义推理延迟,轻量级引导模块将来自在线开放词汇分割的概率实例地图与来自快速图像到文本匹配的空间价值地图对齐。基于这些几何与语义记忆,分层规划器对物体假设、可通过前沿边界和楼梯地标执行目标感知的前沿巡游,并通过立足点引导的三维搜索和垂直约束局部轨迹优化生成可执行跨楼层运动。在HM3D和MP3D上进行的4,195次模拟实验显示,与代表性对象导航(ObjectNav)基线相比具有持续优势。宇树Go2上的50次真实世界试验进一步验证了在无预先地图或人工干预的单楼层及跨楼层室内环境中进行开放词汇目标搜索的能力。代码将发布在https://github.com/wuyi2121/TravExplorer。