Empowering Robotics with Large Language Models: osmAG Map Comprehension with LLMs

Recently, Large Language Models (LLMs) have demonstrated great potential in robotic applications by providing essential general knowledge for situations that can not be pre-programmed beforehand. Generally speaking, mobile robots need to understand maps to execute tasks such as localization or navigation. In this letter, we address the problem of enabling LLMs to comprehend Area Graph, a text-based map representation, in order to enhance their applicability in the field of mobile robotics. Area Graph is a hierarchical, topometric semantic map representation utilizing polygons to demark areas such as rooms, corridors or buildings. In contrast to commonly used map representations, such as occupancy grid maps or point clouds, osmAG (Area Graph in OpensStreetMap format) is stored in a XML textual format naturally readable by LLMs. Furthermore, conventional robotic algorithms such as localization and path planning are compatible with osmAG, facilitating this map representation comprehensible by LLMs, traditional robotic algorithms and humans. Our experiments show that with a proper map representation, LLMs possess the capability to understand maps and answer queries based on that understanding. Following simple fine-tuning of LLaMA2 models, it surpassed ChatGPT-3.5 in tasks involving topology and hierarchy understanding. Our dataset, dataset generation code, fine-tuned LoRA adapters can be accessed at https://github.com/xiefujing/LLM-osmAG-Comprehension.

翻译：近期，大型语言模型（LLMs）通过在无法预先编程的场景中提供必要通用知识，展现了在机器人应用中的巨大潜力。通常而言，移动机器人需要理解地图以执行定位或导航等任务。本文探讨了如何使LLMs理解区域图（Area Graph）这一基于文本的地图表征方式，从而增强其在移动机器人领域的适用性。区域图是一种分层式拓扑度量语义地图表征，通过多边形划分房间、走廊或建筑物等区域。与常用的占据栅格地图或点云等地图表征不同，osMAG（采用OpenStreetMap格式的区域图）以XML文本格式存储，天然可被LLMs读取。此外，定位与路径规划等传统机器人算法可与osMAG兼容，使该地图表征能为LLMs、传统机器人算法及人类共同理解。实验表明，通过恰当的地图表征，LLMs具备理解地图并基于该理解回答查询的能力。经过对LLaMA2模型的简单微调后，其在涉及拓扑与层级理解的任务中超越了ChatGPT-3.5。我们的数据集、数据集生成代码及微调后的LoRA适配器可通过https://github.com/xiefujing/LLM-osmAG-Comprehension 获取。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日