This paper presents an innovative large language model (LLM)-based robotic system for enhancing multi-modal human-robot interaction (HRI). Traditional HRI systems relied on complex designs for intent estimation, reasoning, and behavior generation, which were resource-intensive. In contrast, our system empowers researchers and practitioners to regulate robot behavior through three key aspects: providing high-level linguistic guidance, creating "atomics" for actions and expressions the robot can use, and offering a set of examples. Implemented on a physical robot, it demonstrates proficiency in adapting to multi-modal inputs and determining the appropriate manner of action to assist humans with its arms, following researchers' defined guidelines. Simultaneously, it coordinates the robot's lid, neck, and ear movements with speech output to produce dynamic, multi-modal expressions. This showcases the system's potential to revolutionize HRI by shifting from conventional, manual state-and-flow design methods to an intuitive, guidance-based, and example-driven approach.
翻译:本文提出了一种基于大语言模型(LLM)的创新机器人系统,用于增强多模态人机交互(HRI)。传统HRI系统依赖意图估计、推理和行为生成的复杂设计,资源消耗较大。相比之下,本系统使研究人员和从业者能够通过三个关键方面调控机器人行为:提供高层语言指导、创建机器人可使用的原子动作与表情模块、以及提供一组示例。基于实体机器人的实验表明,该系统能有效适配多模态输入,并依据研究人员定义的准则,以恰当的动作方式协助人类。同时,系统可协调机器人顶盖、颈部及耳部的运动与语音输出,生成动态多模态表情。这展示了该系统通过从传统的手动状态流设计方法转向直观的、基于指导和示例驱动的方法来革新HRI的潜力。