Speech is a natural interface for humans to interact with robots. Yet, aligning a robot's voice to its appearance is challenging due to the rich vocabulary of both modalities. Previous research has explored a few labels to describe robots and tested them on a limited number of robots and existing voices. Here, we develop a robot-voice creation tool followed by large-scale behavioral human experiments (N=2,505). First, participants collectively tune robotic voices to match 175 robot images using an adaptive human-in-the-loop pipeline. Then, participants describe their impression of the robot or their matched voice using another human-in-the-loop paradigm for open-ended labeling. The elicited taxonomy is then used to rate robot attributes and to predict the best voice for an unseen robot. We offer a web interface to aid engineers in customizing robot voices, demonstrating the synergy between cognitive science and machine learning for engineering tools.
翻译:语音是人类与机器人交互的自然界面。然而,由于两种模态的丰富词汇,将机器人的声音与其外观对齐具有挑战性。以往研究探索了少量描述机器人的标签,并在有限的机器人样机和现有语音上进行了测试。本文开发了一种机器人语音生成工具,随后开展了大规模行为学人类实验(N=2,505)。首先,参与者利用自适应人在回路流水线,共同调整语音以匹配175张机器人图像。随后,参与者采用另一种人在回路范式进行开放式标注,描述其对机器人或其匹配语音的印象。由此产生的分类学被用于评估机器人属性,并预测未见机器人最适配的语音。我们提供了一个网络界面,以帮助工程师定制机器人语音,展示了认知科学与机器学习在工程工具开发中的协同作用。