From Language to Action: Can LLM-Based Agents Be Used for Embodied Robot Cognition?

In order to flexibly act in an everyday environment, a robotic agent needs a variety of cognitive capabilities that enable it to reason about plans and perform execution recovery. Large language models (LLMs) have been shown to demonstrate emergent cognitive aspects, such as reasoning and language understanding; however, the ability to control embodied robotic agents requires reliably bridging high-level language to low-level functionalities for perception and control. In this paper, we investigate the extent to which an LLM can serve as a core component for planning and execution reasoning in a cognitive robot architecture. For this purpose, we propose a cognitive architecture in which an agentic LLM serves as the core component for planning and reasoning, while components for working and episodic memories support learning from experience and adaptation. An instance of the architecture is then used to control a mobile manipulator in a simulated household environment, where environment interaction is done through a set of high-level tools for perception, reasoning, navigation, grasping, and placement, all of which are made available to the LLM-based agent. We evaluate our proposed system on two household tasks (object placement and object swapping), which evaluate the agent's reasoning, planning, and memory utilisation. The results demonstrate that the LLM-driven agent can complete structured tasks and exhibits emergent adaptation and memory-guided planning, but also reveal significant limitations, such as hallucinations about the task success and poor instruction following by refusing to acknowledge and complete sequential tasks. These findings highlight both the potential and challenges of employing LLMs as embodied cognitive controllers for autonomous robots.

翻译：为使机器人智能体能够在日常环境中灵活行动，其需要具备多种认知能力，以实现对计划的推理和执行恢复。大型语言模型（LLMs）已被证明展现出涌现的认知特性，如推理和语言理解能力；然而，控制具身机器人智能体需要可靠地将高层级语言与低层级的感知和控制功能相衔接。本文研究了LLM在认知机器人架构中作为规划与执行推理核心组件的潜力。为此，我们提出一种认知架构，其中智能化的LLM作为规划与推理的核心组件，而工作记忆与情景记忆组件则支持从经验中学习和适应。该架构的一个实例被用于在模拟家庭环境中控制一个移动操作机器人，其中环境交互通过一组高层级工具实现，包括感知、推理、导航、抓取和放置，所有这些工具都对基于LLM的智能体开放。我们在两项家庭任务（物体放置和物体交换）上评估所提出的系统，这些任务评估了智能体的推理、规划和记忆利用能力。结果表明，LLM驱动的智能体能够完成结构化任务，并展现出涌现的适应能力和记忆引导的规划能力，但也揭示了显著的局限性，例如对任务成功产生幻觉，以及因拒绝承认和完成顺序任务而导致的指令遵循能力不足。这些发现凸显了将LLM用作自主机器人具身认知控制器的潜力与挑战。