People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person glancing at them or saying "excuse me" to pass people in a busy corridor. We would like robots to also demonstrate expressive behaviors in human-robot interaction. Prior work proposes rule-based methods that struggle to scale to new communication modalities or social situations, while data-driven methods require specialized datasets for each social situation the robot is used in. We propose to leverage the rich social context available from large language models (LLMs) and their ability to generate motion based on instructions or user preferences, to generate expressive robot motion that is adaptable and composable, building upon each other. Our approach utilizes few-shot chain-of-thought prompting to translate human language instructions into parametrized control code using the robot's available and learned skills. Through user studies and simulation experiments, we demonstrate that our approach produces behaviors that users found to be competent and easy to understand. Supplementary material can be found at https://generative-expressive-motion.github.io/.
翻译:人们通过表达性行为有效沟通并协调与他人的行动,例如对看向自己的人点头致意,或在拥挤走廊中说“借过”以便穿行。我们希望机器人也能在人机交互中展现表达性行为。先前的工作提出了基于规则的方法,但难以扩展到新的沟通模态或社交情境,而数据驱动的方法则需要为机器人使用的每种社交情境准备专用数据集。我们提出利用大型语言模型(LLMs)丰富的社交语境以及其根据指令或用户偏好生成运动的能力,来生成可适配、可组合且可相互叠加的表达性机器人运动。我们的方法通过少样本思维链提示,将人类语言指令翻译为利用机器人现有及已学技能的参数化控制代码。通过用户研究和仿真实验,我们证明该方法生成的行为被用户认为具有胜任力且易于理解。补充材料见 https://generative-expressive-motion.github.io/。