This paper tackles the challenging task of evaluating socially situated conversational robots and presents a novel objective evaluation approach that relies on multimodal user behaviors. In this study, our main focus is on assessing the human-likeness of the robot as the primary evaluation metric. While previous research often relied on subjective evaluations from users, our approach aims to evaluate the robot's human-likeness based on observable user behaviors indirectly, thus enhancing objectivity and reproducibility. To begin, we created an annotated dataset of human-likeness scores, utilizing user behaviors found in an attentive listening dialogue corpus. We then conducted an analysis to determine the correlation between multimodal user behaviors and human-likeness scores, demonstrating the feasibility of our proposed behavior-based evaluation method.
翻译:本文着手应对社交情境对话机器人评估这一具有挑战性的任务,提出了一种依赖于多模态用户行为的新型客观评估方法。本研究中,我们的主要关注点是将机器人的类人程度作为主要评估指标。以往研究常依赖于用户的主观评价,而我们的方法旨在基于可观测的用户行为间接评估机器人的类人程度,从而增强客观性与可重复性。首先,我们利用专注聆听对话语料库中的用户行为,创建了一个带有人类类人程度评分的人工标注数据集。随后,我们开展分析以确定多模态用户行为与类人程度评分之间的相关性,从而验证了所提出的基于行为评估方法的可行性。