Visual navigation typically assumes the existence of at least one obstacle-free path between start and goal, which must be discovered/planned by the robot. However, in real-world scenarios, such as home environments and warehouses, clutter can block all routes. Targeted at such cases, we introduce the Lifelong Interactive Navigation problem, where a mobile robot with manipulation abilities can move clutter to forge its own path to complete sequential object- placement tasks - each involving placing an given object (eg. Alarm clock, Pillow) onto a target object (eg. Dining table, Desk, Bed). To address this lifelong setting - where effects of environment changes accumulate and have long-term effects - we propose an LLM-driven, constraint-based planning framework with active perception. Our framework allows the LLM to reason over a structured scene graph of discovered objects and obstacles, deciding which object to move, where to place it, and where to look next to discover task-relevant information. This coupling of reasoning and active perception allows the agent to explore the regions expected to contribute to task completion rather than exhaustively mapping the environment. A standard motion planner then executes the corresponding navigate-pick-place, or detour sequence, ensuring reliable low-level control. Evaluated in physics-enabled ProcTHOR-10k simulator, our approach outperforms non-learning and learning-based baselines. We further demonstrate our approach qualitatively on real-world hardware.
翻译:视觉导航通常假设起点与目标之间存在至少一条无障碍路径,机器人需要发现或规划该路径。然而,在实际场景(如家庭环境和仓库)中,杂物可能阻塞所有路线。针对此类情况,我们提出了终身交互式导航问题:具备操作能力的移动机器人可以通过移动杂物来开辟路径,以完成顺序性物体放置任务——每个任务涉及将给定物体(如闹钟、枕头)放置到目标物体(如餐桌、书桌、床上)上。为应对这一终身设定(其中环境变化的影响会累积并产生长期效应),我们提出了一种基于大语言模型驱动、结合主动感知的约束规划框架。该框架使大语言模型能够对已发现物体与障碍物的结构化场景图进行推理,决定移动哪个物体、将其放置何处以及下一步应探查何处以发现任务相关信息。这种推理与主动感知的耦合使智能体能够探索预期有助于任务完成的区域,而非对环境进行穷尽式建图。随后,标准运动规划器执行相应的导航-抓取-放置或绕行序列,确保可靠的低层控制。在支持物理仿真的ProcTHOR-10k模拟器中评估表明,我们的方法优于非学习型及学习型基线方法。我们进一步在真实硬件上对本方法进行了定性验证。