Semantic maps allow a robot to reason about its surroundings to fulfill tasks such as navigating known environments, finding specific objects, and exploring unmapped areas. Traditional mapping approaches provide accurate geometric representations but are often constrained by pre-designed symbolic vocabularies. The reliance on fixed object classes makes it impractical to handle out-of-distribution knowledge not defined at design time. Recent advances in Vision-Language Foundation Models, such as CLIP, enable open-set mapping, where objects are encoded as high-dimensional embeddings rather than fixed labels. In LIEREx, we integrate these VLFMs with established 3D Semantic Scene Graphs to enable target-directed exploration by an autonomous agent in partially unknown environments.
翻译:语义地图使机器人能够推理其周围环境,以完成诸如在已知环境中导航、寻找特定物体以及探索未测绘区域等任务。传统建图方法提供精确的几何表示,但通常受限于预先设计的符号词汇表。对固定物体类别的依赖使得处理设计时未定义的分布外知识变得不切实际。视觉-语言基础模型(如CLIP)的最新进展实现了开放集建图,其中物体被编码为高维嵌入而非固定标签。在LIEREx中,我们将这些VLFM与成熟的3D语义场景图相结合,使自主智能体能够在部分未知环境中进行目标导向的探索。