This paper presents an innovative large language model (LLM)-based robotic system for enhancing multi-modal human-robot interaction (HRI). Traditional HRI systems relied on complex designs for intent estimation, reasoning, and behavior generation, which were resource-intensive. In contrast, our system empowers researchers and practitioners to regulate robot behavior through three key aspects: providing high-level linguistic guidance, creating "atomic actions" and expressions the robot can use, and offering a set of examples. Implemented on a physical robot, it demonstrates proficiency in adapting to multi-modal inputs and determining the appropriate manner of action to assist humans with its arms, following researchers' defined guidelines. Simultaneously, it coordinates the robot's lid, neck, and ear movements with speech output to produce dynamic, multi-modal expressions. This showcases the system's potential to revolutionize HRI by shifting from conventional, manual state-and-flow design methods to an intuitive, guidance-based, and example-driven approach. Supplementary material can be found at https://hri-eu.github.io/Lami/
翻译:本文提出了一种基于大语言模型(LLM)的创新机器人系统,用于增强多模态人机交互(HRI)。传统HRI系统依赖于复杂的意图估计、推理和行为生成设计,这些设计消耗大量资源。相比之下,我们的系统通过三个关键方面使研究人员和从业者能够调控机器人行为:提供高级语言指导、创建机器人可用的“原子动作”和表达方式,以及提供一组示例。该系统在实际机器人上实现,展示了其适应多模态输入并按照研究人员定义的准则,以适当的方式使用机械臂协助人类完成任务的能力。同时,它协调机器人顶盖、颈部和耳朵的运动与语音输出,产生动态的多模态表达。这展示了该系统的潜力,能够将HRI从传统的手动状态和流程设计方法,转变为一种直观、基于指导且以示例为驱动的全新范式。补充材料见https://hri-eu.github.io/Lami/。