Most people who have tried to learn a foreign language would have experienced difficulties understanding or speaking with a native speaker's accent. For native speakers, understanding or speaking a new accent is likewise a difficult task. An accent conversion system that changes a speaker's accent but preserves that speaker's voice identity, such as timbre and pitch, has the potential for a range of applications, such as communication, language learning, and entertainment. Existing accent conversion models tend to change the speaker identity and accent at the same time. Here, we use adversarial learning to disentangle accent dependent features while retaining other acoustic characteristics. What sets our work apart from existing accent conversion models is the capability to convert an unseen speaker's utterance to multiple accents while preserving its original voice identity. Subjective evaluations show that our model generates audio that sound closer to the target accent and like the original speaker.
翻译:大多数学习外语的人都曾经历过理解或模仿母语者口音的困难。对于母语者而言,理解或掌握新口音同样具有挑战性。口音转换系统能够改变说话者的口音,同时保留其声音特征(如音色和音高),在通信、语言学习和娱乐等领域具有广泛的应用潜力。现有口音转换模型往往同时改变说话者身份和口音。本文采用对抗学习技术,在保留其他声学特征的同时解耦口音相关特征。与现有口音转换模型相比,我们的方法独特之处在于能够将未见过的说话者语音转换为多种口音,同时保持其原始声音身份。主观评估表明,我们的模型生成的音频更接近目标口音,且与原说话者声音特征一致。