Object goal navigation (ObjectNav) in unseen environments is a fundamental task for Embodied AI. Agents in existing works learn ObjectNav policies based on 2D maps, scene graphs, or image sequences. Considering this task happens in 3D space, a 3D-aware agent can advance its ObjectNav capability via learning from fine-grained spatial information. However, leveraging 3D scene representation can be prohibitively unpractical for policy learning in this floor-level task, due to low sample efficiency and expensive computational cost. In this work, we propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices, namely corner-guided exploration policy and category-aware identification policy, simultaneously perform by utilizing online fused 3D points as observation. Through extensive experiments, we show that this framework can dramatically improve the performance in ObjectNav through learning from 3D scene representation. Our framework achieves the best performance among all modular-based methods on the Matterport3D and Gibson datasets, while requiring (up to 30x) less computational cost for training.
翻译:物体目标导航(ObjectNav)在未知环境中是具身智能的一项基础任务。现有工作中的智能体基于二维地图、场景图或图像序列学习ObjectNav策略。考虑到该任务发生在三维空间中,三维感知的智能体可通过学习精细的空间信息来提升其ObjectNav能力。然而,利用三维场景表示进行这种楼层级任务的策略学习可能因样本效率低且计算成本高而难以实际应用。在本研究中,我们提出了一种基于两个简洁子策略的挑战性三维感知ObjectNav框架。这两个子策略,即转角引导的探索策略和类别感知的识别策略,通过利用在线融合的三维点云作为观测来同步执行。通过大量实验,我们表明该框架可通过从三维场景表示中学习显著提升ObjectNav的性能。我们的框架在Matterport3D和Gibson数据集上取得了所有基于模块方法中的最佳性能,同时所需训练计算成本降低(多达30倍)。