We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice represents a significant advancement in addressing the following open challenges in the field: 1) Flexible Voice Style Control. OpenVoice enables granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, in addition to replicating the tone color of the reference speaker. The voice styles are not directly copied from and constrained by the style of the reference speaker. Previous approaches lacked the ability to flexibly manipulate voice styles after cloning. 2) Zero-Shot Cross-Lingual Voice Cloning. OpenVoice achieves zero-shot cross-lingual voice cloning for languages not included in the massive-speaker training set. Unlike previous approaches, which typically require extensive massive-speaker multi-lingual (MSML) dataset for all languages, OpenVoice can clone voices into a new language without any massive-speaker training data for that language. OpenVoice is also computationally efficient, costing tens of times less than commercially available APIs that offer even inferior performance. To foster further research in the field, we have made the source code and trained model publicly accessible. We also provide qualitative results in our demo website. Prior to its public release, our internal version of OpenVoice was used tens of millions of times by users worldwide between May and October 2023, serving as the backend of MyShell.
翻译:我们提出OpenVoice,一种仅需参考说话人的短音频片段即可复制其声音并生成多语言语音的多功能语音克隆方法。OpenVoice在解决该领域以下开放性挑战方面取得了显著进展:1)灵活的声音风格控制。OpenVoice在复制参考说话人音色的基础上,实现了对声音风格的精细控制,包括情感、口音、节奏、停顿和语调。声音风格并非直接从参考说话人的风格中复制并受其约束。先前的方法缺乏在克隆后灵活操控声音风格的能力。2)零样本跨语言语音克隆。OpenVoice能够对不包含在大规模说话人训练集中的语言实现零样本跨语言语音克隆。与先前通常需要为所有语言准备大规模多语言说话人数据集的典型方法不同,OpenVoice可以在无需任何该语言的大规模说话人训练数据的情况下,将声音克隆到新语言中。OpenVoice还具有较高的计算效率,其成本仅为提供甚至更差性能的商用API的数十分之一。为促进该领域的进一步研究,我们已公开提供源代码和训练模型。我们在演示网站上提供了定性结果。在公开发布之前,从2023年5月至10月,我们的OpenVoice内部版本作为MyShell的后端,已被全球用户使用了数千万次。