This paper presents the Embedding Pose Graph (EPG), an innovative method that combines the strengths of foundation models with a simple 3D representation suitable for robotics applications. Addressing the need for efficient spatial understanding in robotics, EPG provides a compact yet powerful approach by attaching foundation model features to the nodes of a pose graph. Unlike traditional methods that rely on bulky data formats like voxel grids or point clouds, EPG is lightweight and scalable. It facilitates a range of robotic tasks, including open-vocabulary querying, disambiguation, image-based querying, language-directed navigation, and re-localization in 3D environments. We showcase the effectiveness of EPG in handling these tasks, demonstrating its capacity to improve how robots interact with and navigate through complex spaces. Through both qualitative and quantitative assessments, we illustrate EPG's strong performance and its ability to outperform existing methods in re-localization. Our work introduces a crucial step forward in enabling robots to efficiently understand and operate within large-scale 3D spaces.
翻译:本文提出嵌入位姿图(EPG)这一创新方法,该方法融合了基础模型的优势与适用于机器人应用的简洁三维表示。针对机器人领域对高效空间理解的需求,EPG通过将基础模型特征附加到位姿图节点上,提供了一种紧凑而强大的解决方案。与依赖体素网格或点云等庞大数据格式的传统方法不同,EPG具有轻量级和可扩展的特点。它能够促进多种机器人任务的执行,包括开放词汇查询、歧义消解、基于图像的查询、语言导向导航以及三维环境中的重定位。我们展示了EPG在处理这些任务方面的有效性,证明了其提升机器人在复杂空间中交互与导航能力的潜力。通过定性与定量评估,我们阐明了EPG的卓越性能及其在重定位任务中超越现有方法的能力。本工作为推动机器人高效理解与操作大规模三维空间迈出了关键一步。