This paper introduces a framework, called EMOTION, for generating expressive motion sequences in humanoid robots, enhancing their ability to engage in humanlike non-verbal communication. Non-verbal cues such as facial expressions, gestures, and body movements play a crucial role in effective interpersonal interactions. Despite the advancements in robotic behaviors, existing methods often fall short in mimicking the diversity and subtlety of human non-verbal communication. To address this gap, our approach leverages the in-context learning capability of large language models (LLMs) to dynamically generate socially appropriate gesture motion sequences for human-robot interaction. We use this framework to generate 10 different expressive gestures and conduct online user studies comparing the naturalness and understandability of the motions generated by EMOTION and its human-feedback version, EMOTION++, against those by human operators. The results demonstrate that our approach either matches or surpasses human performance in generating understandable and natural robot motions under certain scenarios. We also provide design implications for future research to consider a set of variables when generating expressive robotic gestures.
翻译:本文提出了一种名为EMOTION的框架,用于生成人形机器人的表现力动作序列,以增强其进行类人非语言交流的能力。面部表情、手势和身体动作等非语言线索在有效的人际互动中起着至关重要的作用。尽管机器人行为研究已取得进展,但现有方法在模仿人类非语言交流的多样性和微妙性方面仍显不足。为弥补这一差距,我们的方法利用大型语言模型(LLMs)的上下文学习能力,动态生成适用于人机交互的、符合社交情境的手势动作序列。我们运用该框架生成了10种不同的表现力手势,并通过在线用户研究,将EMOTION及其基于人类反馈的版本EMOTION++生成的动作与人类操作者生成的动作在自然度和可理解性方面进行了比较。结果表明,在特定场景下,我们的方法在生成可理解且自然的机器人动作方面达到或超越了人类表现。我们还为未来研究提供了设计启示,建议在生成具有表现力的机器人手势时考虑一系列变量。