We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice represents a significant advancement in addressing the following open challenges in the field: 1) Flexible Voice Style Control. OpenVoice enables granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, in addition to replicating the tone color of the reference speaker. The voice styles are not directly copied from and constrained by the style of the reference speaker. Previous approaches lacked the ability to flexibly manipulate voice styles after cloning. 2) Zero-Shot Cross-Lingual Voice Cloning. OpenVoice achieves zero-shot cross-lingual voice cloning for languages not included in the massive-speaker training set. Unlike previous approaches, which typically require extensive massive-speaker multi-lingual (MSML) dataset for all languages, OpenVoice can clone voices into a new language without any massive-speaker training data for that language. OpenVoice is also computationally efficient, costing tens of times less than commercially available APIs that offer even inferior performance. To foster further research in the field, we have made the source code and trained model publicly accessible. We also provide qualitative results in our demo website. Prior to its public release, our internal version of OpenVoice was used tens of millions of times by users worldwide between May and October 2023, serving as the backend of MyShell.
翻译:我们提出OpenVoice,一种仅需参考说话人短音频片段即可复制其声音并生成多语言语音的多功能语音克隆方法。OpenVoice在解决当前领域以下开放挑战方面取得显著进展:1)灵活的声音风格控制。OpenVoice在复制参考说话人音色的同时,能够对声音风格进行精细控制,包括情感、口音、节奏、停顿和语调。这些语音风格并非直接复制自参考说话人风格,也不受其约束。此前的方法缺乏克隆后灵活操控语音风格的能力。2)零样本跨语言语音克隆。OpenVoice能够对大规模说话人训练集未包含的语言实现零样本跨语言语音克隆。不同于此前方法通常需要覆盖所有语言的大规模多语言说话人数据集,OpenVoice可在无需该语言大规模训练数据的情况下实现语音克隆。OpenVoice在计算效率方面亦具优势,其成本仅为性能更低的商业API的几十分之一。为促进该领域进一步研究,我们已公开源代码与训练模型,并在演示网站提供定性结果。在公开发布前,我们的内部版本OpenVoice于2023年5月至10月期间作为MyShell的后端服务,被全球用户使用数千万次。