In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment. The primary difficulty of IIN stems from the necessity of recognizing the target object across varying viewpoints and rejecting potential distractors. Existing map-based navigation methods largely adopt the representation form of Bird's Eye View (BEV) maps, which, however, lack the representation of detailed textures in a scene. To address the above issues, we propose a new Gaussian Splatting Navigation (abbreviated as GaussNav) framework for IIN task, which constructs a novel map representation based on 3D Gaussian Splatting (3DGS). The proposed framework enables the agent to not only memorize the geometry and semantic information of the scene, but also retain the textural features of objects. Our GaussNav framework demonstrates a significant leap in performance, evidenced by an increase in Success weighted by Path Length (SPL) from 0.252 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset. Our code will be made publicly available.
翻译:在具身视觉中,实例图像目标导航(IIN)要求智能体在未探索环境中定位目标图像中描绘的特定物体。IIN的主要困难源于在不同视角下识别目标物体并排除潜在干扰物的必要性。现有的基于地图的导航方法大多采用鸟瞰图(BEV)的表示形式,然而,这种形式缺乏对场景中细节纹理的表示。为解决上述问题,我们提出了一种面向IIN任务的新型高斯泼溅导航框架(简称GaussNav),该框架基于3D高斯泼溅(3DGS)构建了创新的地图表示。所提出的框架不仅使智能体能够记忆场景的几何与语义信息,还能保留物体的纹理特征。我们的GaussNav框架在性能上实现了显著飞跃,在具有挑战性的Habitat-Matterport 3D(HM3D)数据集上,路径长度加权成功率(SPL)从0.252提升至0.578。我们的代码将公开发布。