Embodied AI focuses on the study and development of intelligent systems that possess a physical or virtual embodiment (i.e. robots) and are able to dynamically interact with their environment. Memory and control are the two essential parts of an embodied system and usually require separate frameworks to model each of them. In this paper, we propose a novel and generalizable framework called LLM-Brain: using Large-scale Language Model as a robotic brain to unify egocentric memory and control. The LLM-Brain framework integrates multiple multimodal language models for robotic tasks, utilizing a zero-shot learning approach. All components within LLM-Brain communicate using natural language in closed-loop multi-round dialogues that encompass perception, planning, control, and memory. The core of the system is an embodied LLM to maintain egocentric memory and control the robot. We demonstrate LLM-Brain by examining two downstream tasks: active exploration and embodied question answering. The active exploration tasks require the robot to extensively explore an unknown environment within a limited number of actions. Meanwhile, the embodied question answering tasks necessitate that the robot answers questions based on observations acquired during prior explorations.
翻译:具身人工智能专注于研究并开发具有物理或虚拟实体(即机器人)且能与其环境动态交互的智能系统。记忆与控制是具身系统的两个核心组成部分,通常需要采用不同的框架分别建模。本文提出一种新颖且可泛化的框架——LLM-Brain:利用大规模语言模型作为机器人大脑,以统一自我中心记忆与控制。LLM-Brain框架整合了多个多模态语言模型用于机器人任务,采用零样本学习方法。该框架内的所有组件通过自然语言进行闭环多轮对话,涵盖感知、规划、控制与记忆。系统的核心是一个具身化的大规模语言模型,用于维护自我中心记忆并控制机器人。我们通过两类下游任务演示LLM-Brain:主动探索和具身问答。主动探索任务要求机器人在有限动作次数内广泛探索未知环境,而具身问答任务则要求机器人基于先前探索中获得的观测结果回答问题。