SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation

Deep generative models have achieved significant progress in speech synthesis to date, while high-fidelity singing voice synthesis is still an open problem for its long continuous pronunciation, rich high-frequency parts, and strong expressiveness. Existing neural vocoders designed for text-to-speech cannot directly be applied to singing voice synthesis because they result in glitches and poor high-frequency reconstruction. In this work, we propose SingGAN, a generative adversarial network designed for high-fidelity singing voice synthesis. Specifically, 1) to alleviate the glitch problem in the generated samples, we propose source excitation with the adaptive feature learning filters to expand the receptive field patterns and stabilize long continuous signal generation; and 2) SingGAN introduces global and local discriminators at different scales to enrich low-frequency details and promote high-frequency reconstruction; and 3) To improve the training efficiency, SingGAN includes auxiliary spectrogram losses and sub-band feature matching penalty loss. To the best of our knowledge, SingGAN is the first work designed toward high-fidelity singing voice vocoding. Our evaluation of SingGAN demonstrates the state-of-the-art results with higher-quality (MOS 4.05) samples. Also, SingGAN enables a sample speed of 50x faster than real-time on a single NVIDIA 2080Ti GPU. We further show that SingGAN generalizes well to the mel-spectrogram inversion of unseen singers, and the end-to-end singing voice synthesis system SingGAN-SVS enjoys a two-stage pipeline to transform the music scores into expressive singing voices. Audio samples are available at \url{https://SingGAN.github.io/}

翻译：深基因模型迄今在语音合成方面取得了显著进展,而高信仰的歌声合成仍是一个长期持续发音、富含高频部分和强烈的直观性的问题。用于文本到语音的现有神经语音合成器无法直接应用到语音合成中,因为它们导致发条和高频重建不力。在这项工作中,我们提议SingGAN(SingGAN),这是为高信仰的歌声合成而设计的基因对抗网络。具体地说,1)为了缓解所生成的样本中的裂缝问题,我们建议通过适应性功能学习过滤器提供源源代码,以扩大可接受的场面模式,稳定长期的信号生成;和 2) SingGAN(SingGAN) 引入了不同规模的全球和地方歧视器,以丰富低频细节,促进高频重建; 3) 为提高培训效率,SingGAN(SingGAN) 包括辅助光谱损失和子带特征匹配惩罚损失。据我们所知, SingGANANAN(S)是用于高信仰和高音频-NEV(O-NEVS-SeralSeralSeral-G-G-Seral-G-Servial-Sy-Seral-Serview)的Syal-Syal-Servial-Servial-Servil-S-S-S-S-S-S-S-S-S-S-S-S-S-S-Serg-Sy-Sy-Sy-Servial-Servial-Servial-Servial-Sy-S-S-S-Servial-s-Servial-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-Seral-S-Seral-Seral-Seral-Seral-Seral-S-Seral-Serva-S-S-S-S-S-S-S-S-S-S-S-S-Seral-S-S-S-S-Seral-S-Seral-S

相关内容

Continuity

关注 0

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

46+阅读 · 2020年10月31日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日