This paper introduces CocoNut-Humoresque, an open-source large-scale speech likability corpus that includes speech segments and their per-listener likability scores. Evaluating voice likability is essential to designing preferable voices for speech systems, such as dialogue or announcement systems. In this study, we let 885 listeners rate 1800 speech segments of a wide range of speakers regarding their likability. When constructing the corpus, we also collected the multiple speaker attributes: genders, ages, and favorite YouTube videos. Therefore, the corpus enables the large-scale statistical analysis of voice likability regarding both speaker and listener factors. This paper describes the construction methodology and preliminary data analysis to reveal the gender and age biases in voice likability. In addition, the relationship between the likability and two acoustic features, the fundamental frequencies and the x-vectors of given utterances, is also investigated.
翻译:本文介绍了CocoNut-Humoresque——一个开源的大规模语音喜好度语料库,其中包含语音片段及其对应的每位听者的喜好度评分。评估语音喜好度对于设计语音系统(如对话系统或播报系统)中更受青睐的语音至关重要。在本研究中,我们邀请885名听者对涵盖广泛说话者的1800个语音片段进行喜好度评分。构建语料库时,我们还收集了说话者的多项属性:性别、年龄及喜爱的YouTube视频。因此,该语料库支持从说话者与听者双重视角对语音喜好度进行大规模统计分析。本文阐述了语料库的构建方法及初步数据分析,以揭示语音喜好度中存在的性别与年龄偏差。此外,研究还探讨了喜好度与两个声学特征——基频和给定语句的x-向量——之间的关系。