We present a novel approach to tackle the ObjectNav task for non-stationary and potentially occluded targets in an indoor environment. We refer to this task Portable ObjectNav (or P-ObjectNav), and in this work, present its formulation, feasibility, and a navigation benchmark using a novel memory-enhanced LLM-based policy. In contrast to ObjNav where target object locations are fixed for each episode, P-ObjectNav tackles the challenging case where the target objects move during the episode. This adds a layer of time-sensitivity to navigation, and is particularly relevant in scenarios where the agent needs to find portable targets (e.g. misplaced wallets) in human-centric environments. The agent needs to estimate not just the correct location of the target, but also the time at which the target is at that location for visual grounding -- raising the question about the feasibility of the task. We address this concern by inferring results on two cases for object placement: one where the objects placed follow a routine or a path, and the other where they are placed at random. We dynamize Matterport3D for these experiments, and modify PPO and LLM-based navigation policies for evaluation. Using PPO, we observe that agent performance in the random case stagnates, while the agent in the routine-following environment continues to improve, allowing us to infer that P-ObjectNav is solvable in environments with routine-following object placement. Using memory-enhancement on an LLM-based policy, we set a benchmark for P-ObjectNav. Our memory-enhanced agent significantly outperforms their non-memory-based counterparts across object placement scenarios by 71.76% and 74.68% on average when measured by Success Rate (SR) and Success Rate weighted by Path Length (SRPL), showing the influence of memory on improving P-ObjectNav performance. Our code and dataset will be made publicly available.
翻译:我们提出了一种新颖方法,用于解决室内环境中非静止且可能被遮挡目标的ObjectNav任务。我们将该任务称为便携式物体导航(P-ObjectNav),并在本工作中提出其问题定义、可行性分析以及基于新型记忆增强型LLM策略的导航基准。与目标物体位置在每个回合固定的传统ObjNav不同,P-ObjectNav处理的是目标物体在运动过程中移动的挑战性场景。这为导航增加了时间敏感性,尤其在需要智能体在人类中心环境中寻找便携式目标(如错放的钱包)时具有重要应用价值。智能体不仅需要估计目标的正确位置,还需确定目标位于该位置的时间点以实现视觉定位——这引发了该任务可行性的关键问题。我们通过两种物体放置场景的推理结果来回应这一关切:一种是物体按照固定轨迹或路径移动,另一种是随机放置。我们为这些实验动态改造了Matterport3D数据集,并修改了PPO和基于LLM的导航策略进行评估。实验表明:采用PPO策略时,随机放置场景中的智能体性能停滞不前,而遵循固定轨迹环境中的智能体性能持续提升,这证明在物体按固定轨迹放置的环境中P-ObjectNav具有可解性。通过对基于LLM的策略进行记忆增强,我们建立了P-ObjectNav的基准性能指标。我们的记忆增强型智能体在目标放置场景中,以成功率(SR)和路径长度加权成功率(SRPL)为指标,平均分别比无记忆基线方法提升71.76%和74.68%,充分证明了记忆机制对提升P-ObjectNav性能的关键作用。我们的代码和数据集将公开发布。