Object goal navigation (ObjectNav) in unseen environments is a fundamental task for Embodied AI. Agents in existing works learn ObjectNav policies based on 2D maps, scene graphs, or image sequences. Considering this task happens in 3D space, a 3D-aware agent can advance its ObjectNav capability via learning from fine-grained spatial information. However, leveraging 3D scene representation can be prohibitively unpractical for policy learning in this floor-level task, due to low sample efficiency and expensive computational cost. In this work, we propose a framework for the challenging 3D-aware ObjectNav based on two straightforward sub-policies. The two sub-polices, namely corner-guided exploration policy and category-aware identification policy, simultaneously perform by utilizing online fused 3D points as observation. Through extensive experiments, we show that this framework can dramatically improve the performance in ObjectNav through learning from 3D scene representation. Our framework achieves the best performance among all modular-based methods on the Matterport3D and Gibson datasets, while requiring (up to 30x) less computational cost for training.
翻译:对象目标导航(ObjectNav)在未知环境中是具身智能的一项基本任务。现有工作中的智能体基于2D地图、场景图或图像序列学习ObjectNav策略。考虑到该任务发生在3D空间中,3D感知智能体可以通过从细粒度空间信息中学习来提升其ObjectNav能力。然而,由于样本效率低和计算成本高昂,利用3D场景表示进行这一楼层级别任务的策略学习可能极不实用。在本工作中,我们基于两个简单的子策略提出了一个用于具有挑战性的3D感知ObjectNav的框架。这两个子策略,即角点引导探索策略和类别感知识别策略,通过利用在线融合的3D点作为观测来同时执行。通过大量实验,我们表明该框架通过从3D场景表示中学习,能显著提升ObjectNav的性能。我们的框架在Matterport3D和Gibson数据集上达到了所有基于模块的方法中的最佳性能,同时训练所需的计算成本(最高可降低30倍)。