Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Submitted to the 2023 IEEE/RAS International Conference on Humanoid Robots (Humanoids). Supplementary video available at https://youtu.be/y5O2mRGtsLM

Natural-language dialog is key for intuitive human-robot interaction. It can be used not only to express humans' intents, but also to communicate instructions for improvement if a robot does not understand a command correctly. Of great importance is to endow robots with the ability to learn from such interaction experience in an incremental way to allow them to improve their behaviors or avoid mistakes in the future. In this paper, we propose a system to achieve incremental learning of complex behavior from natural interaction, and demonstrate its implementation on a humanoid robot. Building on recent advances, we present a system that deploys Large Language Models (LLMs) for high-level orchestration of the robot's behavior, based on the idea of enabling the LLM to generate Python statements in an interactive console to invoke both robot perception and action. The interaction loop is closed by feeding back human instructions, environment observations, and execution results to the LLM, thus informing the generation of the next statement. Specifically, we introduce incremental prompt learning, which enables the system to interactively learn from its mistakes. For that purpose, the LLM can call another LLM responsible for code-level improvements of the current interaction based on human feedback. The improved interaction is then saved in the robot's memory, and thus retrieved on similar requests. We integrate the system in the robot cognitive architecture of the humanoid robot ARMAR-6 and evaluate our methods both quantitatively (in simulation) and qualitatively (in simulation and real-world) by demonstrating generalized incrementally-learned knowledge.

翻译：自然语言对话是实现直观人机交互的关键。它不仅能表达人类意图，还能在机器人未能正确理解指令时，传达改进指令。赋予机器人从交互经验中增量学习的能力至关重要，使其能够改进自身行为或避免未来错误。本文提出一种从自然交互中实现复杂行为增量学习的系统，并在人形机器人上演示其实现。基于最新进展，我们提出一种系统，部署大语言模型（LLMs）对机器人行为进行高层编排，其核心理念是让LLM在交互式控制台中生成Python语句，以调用机器人的感知与动作指令。通过将人类指令、环境观测结果及执行反馈回传至LLM以闭合交互循环，从而指导下一条语句的生成。具体而言，我们引入增量提示学习机制，使系统能够交互式地从自身错误中学习。该机制通过LLM调用另一个LLM，基于人类反馈对当前交互进行代码级改进。改进后的交互被存储于机器人记忆系统中，以便在类似请求场景下被检索调用。我们将该系统集成至人形机器人ARMAR-6的认知架构中，并通过展示泛化性增量学习知识，在仿真环境中进行定量评估，在仿真及真实场景中进行定性评估。