Recently, Large Language Models (LLMs) have demonstrated great potential in robotic applications by providing essential general knowledge for situations that can not be pre-programmed beforehand. Generally speaking, mobile robots need to understand maps to execute tasks such as localization or navigation. In this letter, we address the problem of enabling LLMs to comprehend Area Graph, a text-based map representation, in order to enhance their applicability in the field of mobile robotics. Area Graph is a hierarchical, topometric semantic map representation utilizing polygons to demark areas such as rooms, corridors or buildings. In contrast to commonly used map representations, such as occupancy grid maps or point clouds, osmAG (Area Graph in OpensStreetMap format) is stored in a XML textual format naturally readable by LLMs. Furthermore, conventional robotic algorithms such as localization and path planning are compatible with osmAG, facilitating this map representation comprehensible by LLMs, traditional robotic algorithms and humans. Our experiments show that with a proper map representation, LLMs possess the capability to understand maps and answer queries based on that understanding. Following simple fine-tuning of LLaMA2 models, it surpassed ChatGPT-3.5 in tasks involving topology and hierarchy understanding. Our dataset, dataset generation code, fine-tuned LoRA adapters can be accessed at https://github.com/xiefujing/LLM-osmAG-Comprehension.
翻译:近期,大型语言模型(LLMs)通过在无法预先编程的场景中提供必要通用知识,展现了在机器人应用中的巨大潜力。通常而言,移动机器人需要理解地图以执行定位或导航等任务。本文探讨了如何使LLMs理解区域图(Area Graph)这一基于文本的地图表征方式,从而增强其在移动机器人领域的适用性。区域图是一种分层式拓扑度量语义地图表征,通过多边形划分房间、走廊或建筑物等区域。与常用的占据栅格地图或点云等地图表征不同,osMAG(采用OpenStreetMap格式的区域图)以XML文本格式存储,天然可被LLMs读取。此外,定位与路径规划等传统机器人算法可与osMAG兼容,使该地图表征能为LLMs、传统机器人算法及人类共同理解。实验表明,通过恰当的地图表征,LLMs具备理解地图并基于该理解回答查询的能力。经过对LLaMA2模型的简单微调后,其在涉及拓扑与层级理解的任务中超越了ChatGPT-3.5。我们的数据集、数据集生成代码及微调后的LoRA适配器可通过https://github.com/xiefujing/LLM-osmAG-Comprehension 获取。