People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person glancing at them or saying "excuse me" to pass people in a busy corridor. We would like robots to also demonstrate expressive behaviors in human-robot interaction. Prior work proposes rule-based methods that struggle to scale to new communication modalities or social situations, while data-driven methods require specialized datasets for each social situation the robot is used in. We propose to leverage the rich social context available from large language models (LLMs) and their ability to generate motion based on instructions or user preferences, to generate expressive robot motion that is adaptable and composable, building upon each other. Our approach utilizes few-shot chain-of-thought prompting to translate human language instructions into parametrized control code using the robot's available and learned skills. Through user studies and simulation experiments, we demonstrate that our approach produces behaviors that users found to be competent and easy to understand. Supplementary material can be found at https://generative-expressive-motion.github.io/.
翻译:人们通过表达性行为与他人有效沟通和协调行动,例如用点头回应别人的注视,或在拥挤的走廊说“借过”。我们期望机器人在人机交互中也能展现表达性行为。先前工作提出的基于规则的方法难以扩展到新的交流方式或社交场景,而数据驱动方法需要针对每种机器人使用的社交场景专门构建数据集。我们提出利用大语言模型(LLMs)中丰富的社交语境信息及其根据指令或用户偏好生成动作的能力,生成可组合、可叠加的适应性机器人表达性动作。本方法通过少样本思维链提示技术,将人类语言指令转化为基于机器人现有及习得技能的参数化控制代码。用户研究与仿真实验表明,我们的方法生成的行为被用户评价为既专业又易于理解。补充材料详见https://generative-expressive-motion.github.io/。