WildOS: Open-Vocabulary Object Search in the Wild

Autonomous navigation in complex, unstructured outdoor environments requires robots to operate over long ranges without prior maps and limited depth sensing. In such settings, relying solely on geometric frontiers for exploration is often insufficient. In such settings, the ability to reason semantically about where to go and what is safe to traverse is crucial for robust, efficient exploration. This work presents WildOS, a unified system for long-range, open-vocabulary object search that combines safe geometric exploration with semantic visual reasoning. WildOS builds a sparse navigation graph to maintain spatial memory, while utilizing a foundation-model-based vision module, ExploRFM, to score frontier nodes of the graph. ExploRFM simultaneously predicts traversability, visual frontiers, and object similarity in image space, enabling real-time, onboard semantic navigation tasks. The resulting vision-scored graph enables the robot to explore semantically meaningful directions while ensuring geometric safety. Furthermore, we introduce a particle-filter-based method for coarse localization of the open-vocabulary target query, that estimates candidate goal positions beyond the robot's immediate depth horizon, enabling effective planning toward distant goals. Extensive closed-loop field experiments across diverse off-road and urban terrains demonstrate that WildOS enables robust navigation, significantly outperforming purely geometric and purely vision-based baselines in both efficiency and autonomy. Our results highlight the potential of vision foundation models to drive open-world robotic behaviors that are both semantically informed and geometrically grounded. Project Page: https://leggedrobotics.github.io/wildos/

翻译：在复杂、非结构化的户外环境中实现自主导航，要求机器人在没有先验地图且深度感知受限的情况下进行长距离操作。在此类场景中，仅依赖几何前沿进行探索往往是不够的。在这些场景下，能够从语义上推理应前往何处以及哪些区域可安全通行，对于实现稳健、高效的探索至关重要。本文提出了WildOS，一个用于长距离、开放词汇物体搜索的统一系统，它将安全的几何探索与语义视觉推理相结合。WildOS构建了一个稀疏导航图以维持空间记忆，同时利用基于基础模型的视觉模块ExploRFM对图的边界节点进行评分。ExploRFM在图像空间中同时预测可通行性、视觉前沿和物体相似度，从而实现实时的、机载的语义导航任务。由此产生的视觉评分图使机器人能够探索具有语义意义的方向，同时确保几何安全性。此外，我们引入了一种基于粒子滤波的方法，用于对开放词汇目标查询进行粗略定位，该方法能估计超出机器人即时深度视野的候选目标位置，从而实现对遥远目标的有效规划。在多种越野和城市地形中进行的大量闭环实地实验表明，WildOS能够实现稳健的导航，在效率和自主性方面显著优于纯几何和纯视觉的基线方法。我们的结果凸显了视觉基础模型在驱动兼具语义感知和几何基础的开放世界机器人行为方面的潜力。项目页面：https://leggedrobotics.github.io/wildos/