Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices

Voice conversion aims to convert source speech into a target voice using recordings of the target speaker as a reference. Newer models are producing increasingly realistic output. But what happens when models are fed with non-standard data, such as speech from a user with a speech impairment? We investigate how a recent voice conversion model performs on non-standard downstream voice conversion tasks. We use a simple but robust approach called k-nearest neighbors voice conversion (kNN-VC). We look at four non-standard applications: stuttered voice conversion, cross-lingual voice conversion, musical instrument conversion, and text-to-voice conversion. The latter involves converting to a target voice specified through a text description, e.g. "a young man with a high-pitched voice". Compared to an established baseline, we find that kNN-VC retains high performance in stuttered and cross-lingual voice conversion. Results are more mixed for the musical instrument and text-to-voice conversion tasks. E.g., kNN-VC works well on some instruments like drums but not on others. Nevertheless, this shows that voice conversion models - and kNN-VC in particular - are increasingly applicable in a range of non-standard downstream tasks. But there are still limitations when samples are very far from the training distribution. Code, samples, trained models: https://rf5.github.io/sacair2023-knnvc-demo/.

翻译：语音转换旨在利用目标说话人的录音作为参考，将源语音转换为目标嗓音。新模型正在产生日益逼真的输出。但当模型输入非标准数据（例如有语言障碍用户的语音）时，会发生什么？我们研究了一个最新语音转换模型在非标准下游语音转换任务上的表现。我们采用一种简单但稳健的方法——k近邻语音转换（kNN-VC）。我们考察了四种非标准应用：口吃语音转换、跨语言语音转换、乐器转换以及文本到语音转换。后者涉及通过文本描述（例如“一个音调高的年轻男性”）指定目标嗓音进行转换。与既定基线相比，我们发现kNN-VC在口吃语音和跨语言语音转换中保持了高性能。乐器转换和文本到语音转换任务的结果则更为参差不齐。例如，kNN-VC对鼓等某些乐器效果良好，但对其他乐器则不然。尽管如此，这表明语音转换模型——尤其是kNN-VC——在多种非标准下游任务中日益适用。但当样本与训练分布差异极大时，仍存在局限性。代码、样本、训练模型：https://rf5.github.io/sacair2023-knnvc-demo/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日