Recently, and under the umbrella of Responsible AI, efforts have been made to develop gender-ambiguous synthetic speech to represent with a single voice all individuals in the gender spectrum. However, research efforts have completely overlooked the speaking style despite differences found among binary and non-binary populations. In this work, we synthesise gender-ambiguous speech by combining the timbre of a male speaker with the manner of speech of a female speaker using voice morphing and pitch shifting towards the male-female boundary. Subjective evaluations indicate that the ambiguity of the morphed samples that convey the female speech style is higher than those that undergo plain pitch transformations suggesting that the speaking style can be a contributing factor in creating gender-ambiguous speech. To our knowledge, this is the first study that explicitly uses the transfer of the speaking style to create gender-ambiguous voices.
翻译:近年来,在负责任人工智能的框架下,研究人员致力于开发性别模糊合成语音,以单一声音代表性别谱系中的全部个体。然而,现有研究完全忽略了说话风格这一要素,尽管二元性别与非二元性别群体之间存在显著差异。本研究通过将男性说话人的音色与女性说话人的说话方式相结合,采用语音变形及音高偏移至男女边界区域的方法,合成了性别模糊语音。主观评估结果表明,承载女性说话风格的变形样本其性别模糊度高于仅进行简单音高变换处理的样本,这提示说话风格可能是影响性别模糊语音生成的关键因素。据我们所知,这是首个明确利用说话风格迁移来创造性别模糊语音的研究。