Large language models (LLMs) have emerged as the dominant paradigm for robotic task planning using natural language instructions. However, trained on general internet data, LLMs are not inherently aligned with the embodiment, skill sets, and limitations of real-world robotic systems. Inspired by the emerging paradigm of verbal reinforcement learning-where LLM agents improve through self-reflection and few-shot learning without parameter updates-we introduce PragmaBot, a framework that enables robots to learn task planning through real-world experience. PragmaBot employs a vision-language model (VLM) as the robot's "brain" and "eye", allowing it to visually evaluate action outcomes and self-reflect on failures. These reflections are stored in a short-term memory (STM), enabling the robot to quickly adapt its behavior during ongoing tasks. Upon task completion, the robot summarizes the lessons learned into its long-term memory (LTM). When facing new tasks, it can leverage retrieval-augmented generation (RAG) to plan more grounded action sequences by drawing on relevant past experiences and knowledge. Experiments on four challenging robotic tasks show that STM-based self-reflection increases task success rates from 35% to 84%, with emergent intelligent object interactions. In 12 real-world scenarios (including eight previously unseen tasks), the robot effectively learns from the LTM and improves single-trial success rates from 22% to 80%, with RAG outperforming naive prompting. These results highlight the effectiveness and generalizability of PragmaBot. Project webpage: https://pragmabot.github.io/
翻译:大型语言模型已成为利用自然语言指令进行机器人任务规划的主流范式。然而,由于在通用互联网数据上训练,大型语言模型本质上并未与真实世界机器人系统的具身性、技能集和局限性对齐。受新兴的言语强化学习范式启发——在该范式中大型语言模型智能体通过无需参数更新的自我反思和少样本学习实现改进——我们提出了PragmaBot框架,使机器人能够通过真实世界经验学习任务规划。PragmaBot采用视觉语言模型作为机器人的“大脑”和“眼睛”,使其能够视觉评估行动结果并对失败进行自我反思。这些反思存储于短期记忆中,使机器人能够在执行任务过程中快速调整行为。任务完成后,机器人将所学经验总结至长期记忆。面对新任务时,机器人可通过检索增强生成技术,利用相关历史经验和知识规划更贴合实际的动作序列。在四项具有挑战性的机器人任务上的实验表明,基于短期记忆的自我反思将任务成功率从35%提升至84%,并涌现出智能化的物体交互行为。在12个真实场景(包含8个先前未见任务)中,机器人能有效从长期记忆学习,将单次尝试成功率从22%提升至80%,且检索增强生成方法优于原始提示策略。这些结果凸显了PragmaBot的有效性和泛化能力。项目网页:https://pragmabot.github.io/