This paper tackles the challenging task of evaluating socially situated conversational robots and presents a novel objective evaluation approach that relies on multimodal user behaviors. In this study, our main focus is on assessing the human-likeness of the robot as the primary evaluation metric. While previous research often relied on subjective evaluations from users, our approach aims to evaluate the robot's human-likeness based on observable user behaviors indirectly, thus enhancing objectivity and reproducibility. To begin, we created an annotated dataset of human-likeness scores, utilizing user behaviors found in an attentive listening dialogue corpus. We then conducted an analysis to determine the correlation between multimodal user behaviors and human-likeness scores, demonstrating the feasibility of our proposed behavior-based evaluation method.
翻译:本文攻克了社交情境对话机器人评估这一具有挑战性的课题,提出了一种基于多模态用户行为的新型客观评估方法。本研究重点关注将机器人的类人程度作为核心评估指标。以往研究多依赖用户主观评价,而本方法旨在通过可观测的用户行为间接评估机器人的类人程度,从而提升评估的客观性与可重复性。首先,我们利用专注倾听对话语料库中的用户行为,构建了带标注的类人程度评分数据集。随后通过分析多模态用户行为与类人程度评分之间的相关性,验证了所提出的基于行为评估方法的可行性。