We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice represents a significant advancement in addressing the following open challenges in the field: 1) Flexible Voice Style Control. OpenVoice enables granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, in addition to replicating the tone color of the reference speaker. The voice styles are not directly copied from and constrained by the style of the reference speaker. Previous approaches lacked the ability to flexibly manipulate voice styles after cloning. 2) Zero-Shot Cross-Lingual Voice Cloning. OpenVoice achieves zero-shot cross-lingual voice cloning for languages not included in the massive-speaker training set. Unlike previous approaches, which typically require extensive massive-speaker multi-lingual (MSML) dataset for all languages, OpenVoice can clone voices into a new language without any massive-speaker training data for that language. OpenVoice is also computationally efficient, costing tens of times less than commercially available APIs that offer even inferior performance. To foster further research in the field, we have made the source code and trained model publicly accessible. We also provide qualitative results in our demo website. Prior to its public release, our internal version of OpenVoice was used tens of millions of times by users worldwide between May and October 2023, serving as the backend of MyShell.
翻译:我们提出OpenVoice,一种仅需参考说话者的一段短音频片段即可复制其声音并生成多语言语音的多功能语音克隆方法。OpenVoice在解决以下开放性问题方面取得了显著进展:1)灵活的声音风格控制。OpenVoice能够在复制参考说话者音色的同时,对声音风格进行精细控制,包括情感、口音、节奏、停顿和语调。这些声音风格并非直接复制自参考说话者的风格,也不受其风格限制。以往方法在克隆后缺乏灵活操控声音风格的能力。2)零样本跨语言语音克隆。OpenVoice实现了对大规模说话者训练集中未包含语言的零样本跨语言语音克隆。与以往方法通常需要为所有语言准备大规模说话者多语言(MSML)数据集不同,OpenVoice能够在无需任何该语言的大规模说话者训练数据情况下,将声音克隆到新语言中。OpenVoice在计算上也高效,成本比提供甚至更差性能的商业API低数十倍。为了促进该领域的进一步研究,我们公开了源代码和训练模型。我们还提供了演示网站上的定性结果。在公开版本发布之前,我们的内部版本OpenVoice在2023年5月至10月期间已被全球用户使用数千万次,作为MyShell的后端服务。