Representations of AI agents in user interfaces and robotics are predominantly White, not only in terms of facial and skin features, but also in the synthetic voices they use. In this paper we explore some unexpected challenges in the representation of race we found in the process of developing an U.S. English Text-to-Speech (TTS) system aimed to sound like an educated, professional, regional accent-free African American woman. The paper starts by presenting the results of focus groups with African American IT professionals where guidelines and challenges for the creation of a representative and appropriate TTS system were discussed and gathered, followed by a discussion about some of the technical difficulties faced by the TTS system developers. We then describe two studies with U.S. English speakers where the participants were not able to attribute the correct race to the African American TTS voice while overwhelmingly correctly recognizing the race of a White TTS system of similar quality. A focus group with African American IT workers not only confirmed the representativeness of the African American voice we built, but also suggested that the surprising recognition results may have been caused by the inability or the latent prejudice from non-African Americans to associate educated, non-vernacular, professionally-sounding voices to African American people.
翻译:人工智能代理在用户界面和机器人技术中的呈现方式以白人居多,不仅体现在面部和肤色特征上,在合成语音方面也是如此。本文探讨了我们在开发一个旨在模仿受过教育、专业、无地域口音的非裔美国女性声音的美式英语文本转语音系统过程中,所遇到的种族表征方面的意外挑战。本文首先展示了与非裔美国IT专业人士进行的焦点小组讨论结果,其中收集并讨论了创建具有代表性和适当性的TTS系统的指南与挑战,随后探讨了TTS系统开发人员面临的一些技术难题。接着我们描述了两项针对美式英语使用者的研究:参与者无法将正确的种族归属赋予非裔美式英语TTS声音,却能以压倒性的准确率识别出同等质量的白色人种TTS声音的种族。与非裔美国IT从业者的焦点小组讨论不仅证实了我们构建的非裔美式英语声音的代表性,还暗示这些令人惊讶的识别结果可能源于非非裔美国人无法将受过教育、不使用方言、听感专业的声音与非裔美国人群体相联系,或存在潜在的偏见。