Embodied AI focuses on the study and development of intelligent systems that possess a physical or virtual embodiment (i.e. robots) and are able to dynamically interact with their environment. Memory and control are the two essential parts of an embodied system and usually require separate frameworks to model each of them. In this paper, we propose a novel and generalizable framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control. The LLM-Brain framework integrates multiple multimodal language models for robotic tasks, utilizing a zero-shot learning approach. All components within LLM-Brain communicate using natural language in closed-loop multi-round dialogues that encompass perception, planning, control, and memory. The core of the system is an embodied LLM to maintain egocentric memory and control the robot. We demonstrate LLM-Brain by examining two downstream tasks: active exploration and embodied question answering. The active exploration tasks require the robot to extensively explore an unknown environment within a limited number of actions. Meanwhile, the embodied question answering tasks necessitate that the robot answers questions based on observations acquired during prior explorations.
翻译:具身人工智能专注于研究并开发具备物理或虚拟实体(即机器人)、并能与其环境动态交互的智能系统。记忆与控制是具身系统的两个核心组成部分,通常需要分别建立模型来处理。本文提出了一种新颖且具普适性的框架,名为 LLM-Brain:利用大规模语言模型作为机器人大脑,以统一自我中心记忆与控制。LLM-Brain 框架集成了多个多模态语言模型用于机器人任务,采用零样本学习方法。框架内的所有组件通过自然语言进行闭环多轮对话,涵盖感知、规划、控制和记忆。系统的核心是一个具身化的大语言模型,用于维持自我中心记忆并控制机器人。我们通过两个下游任务对 LLM-Brain 进行了验证:主动探索与具身问答。主动探索任务要求机器人在有限的动作次数内广泛探索未知环境;而具身问答任务则要求机器人基于先前探索过程中获取的观察信息来回答问题。