Mobile agents require efficient exploration strategies to map unseen environments and autonomously plan tasks. Traditional methods rely on generating occupancy maps and optimizing the sequence in which unexplored regions are visited. However, in sensor-constrained settings, such as those limited to monocular cameras, generating accurate occupancy maps is challenging. To address this, we propose VANDERER, an exploration framework that leverages a Visual Curiosity Module (VCM) to guide pre-trained diffusion policies using only monocular image data. This curiosity module predicts the outcomes of proposed actions via a navigation world model and evaluates them through a curiosity cost. The cost then guides the diffusion process toward generating actions that maximize exploration. Evaluated across diverse simulated environments, VANDERER consistently outperforms established baselines, exploring an average of 13.4% more area than NoMaD. Our results reveal a direct correlation between visual and geometric curiosity in outdoor environments, demonstrating that VANDERER can effectively leverage this relationship for efficient exploration using sensor-constrained agents.
翻译:移动代理需要高效的探索策略来映射未知环境并自主规划任务。传统方法依赖于生成占据地图并优化访问未探索区域的顺序。然而,在传感器受限的设置中(例如仅限单目相机),生成精确的占据地图颇具挑战。为此,我们提出VANDERER,一种利用视觉好奇心模块(VCM)仅通过单目图像数据引导预训练扩散策略的探索框架。该好奇心模块通过导航世界模型预测所提议动作的结果,并通过好奇心代价对其进行评估。该代价随后引导扩散过程生成最大化探索的动作。在多种模拟环境中进行评估时,VANDERER持续优于已建立的基线,平均探索面积比NoMaD多13.4%。我们的结果揭示了室外环境中视觉好奇心与几何好奇心之间的直接相关性,表明VANDERER能够有效利用这种关系,在传感器受限的代理中实现高效探索。