In this paper, we investigated how the choice of a Wizard-of-Oz (WoZ) interface affects communication with a robot from both the user's and the wizard's perspective. In a conversational setting, we used three WoZ interfaces with varying levels of dialogue input and output restrictions: a) a restricted perception GUI that showed fixed-view video and ASR transcripts and let the wizard trigger pre-scripted utterances and gestures; b) an unrestricted perception GUI that added real-time audio from the participant and the robot c) a VR telepresence interface that streamed immersive stereo video and audio to the wizard and forwarded the wizard's spontaneous speech, gaze and facial expressions to the robot. We found that the interaction mediated by the VR interface was preferred by users in terms of robot features and perceived social presence. For the wizards, the VR condition turned out to be the most demanding but elicited a higher social connection with the users. VR interface also induced the most connected interaction in terms of inter-speaker gaps and overlaps, while Restricted GUI induced the least connected flow and the largest silences. Given these results, we argue for more WoZ studies using telepresence interfaces. These studies better reflect the robots of tomorrow and offer a promising path to automation based on naturalistic contextualized verbal and non-verbal behavioral data.
翻译:本文研究了“巫师”(Wizard-of-Oz, WoZ)界面的选择如何从用户和巫师两个视角影响与机器人的通信。在对话场景中,我们使用了三种对对话输入与输出具有不同限制程度的WoZ界面:a)限制感知图形用户界面,显示固定视角视频和自动语音识别转录文本,并允许巫师触发预设话语和手势;b)非限制感知图形用户界面,增加了参与者和机器人的实时音频;c)虚拟现实遥在界面,向巫师传输沉浸式立体视频和音频,并将巫师的自发语音、注视和面部表情转发至机器人。研究发现,在机器人特征和感知社会存在方面,用户更偏好通过虚拟现实界面中介的交互。对巫师而言,虚拟现实条件被证明是最具挑战性,但同时也激发了与用户之间更强的社会连接。虚拟现实界面还促使了交谈间隔和重叠方面最连贯的交互,而限制图形用户界面则导致最不连贯的交流流和最长静默。基于这些结果,我们主张开展更多使用遥在界面的WoZ研究。这类研究更能反映未来机器人的形态,并为基于自然情境化言语与非言语行为数据的自动化提供了一条有前景的路径。