StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis

Style transfer for out-of-domain (OOD) singing voice synthesis (SVS) focuses on generating high-quality singing voices with unseen styles (such as timbre, emotion, pronunciation, and articulation skills) derived from reference singing voice samples. However, the endeavor to model the intricate nuances of singing voice styles is an arduous task, as singing voices possess a remarkable degree of expressiveness. Moreover, existing SVS methods encounter a decline in the quality of synthesized singing voices in OOD scenarios, as they rest upon the assumption that the target vocal attributes are discernible during the training phase. To overcome these challenges, we propose StyleSinger, the first singing voice synthesis model for zero-shot style transfer of out-of-domain reference singing voice samples. StyleSinger incorporates two critical approaches for enhanced effectiveness: 1) the Residual Style Adaptor (RSA) which employs a residual quantization module to capture diverse style characteristics in singing voices, and 2) the Uncertainty Modeling Layer Normalization (UMLN) to perturb the style attributes within the content representation during the training phase and thus improve the model generalization. Our extensive evaluations in zero-shot style transfer undeniably establish that StyleSinger outperforms baseline models in both audio quality and similarity to the reference singing voice samples. Access to singing voice samples can be found at https://stylesinger.github.io/.

翻译：域外（OOD）歌唱声音合成（SVS）中的风格迁移旨在利用参考歌唱样本中未见过的风格（如音色、情感、发音及咬字技巧）生成高质量的歌唱声音。然而，由于歌唱声音具有高度的表现力，建模其风格中的细微差异是一项艰巨的任务。此外，现有SVS方法基于目标声学属性在训练阶段可辨识的假设，在OOD场景下合成歌唱声音的质量会出现下降。为应对这些挑战，我们提出了StyleSinger，这是首个用于对域外参考歌唱样本进行零样本风格迁移的歌唱声音合成模型。StyleSinger包含两个提升效力的关键方法：1）残差风格适配器（RSA），它采用残差量化模块来捕捉歌唱声音中的多样化风格特征；2）不确定性建模层归一化（UMLN），通过在训练阶段扰动内容表示中的风格属性来提升模型泛化能力。我们在零样本风格迁移上的广泛评估明确证实，StyleSinger在音频质量和对参考歌唱样本的相似度上均优于基线模型。歌唱声音样本访问地址：https://stylesinger.github.io/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日